This machine is not clustered, so it isn't that. There has been no manual intervention either.

What sort of configuration issue would cause this?

Basically workspace/collection has 11G in it and files/collection has 42m in it. They are not duplicates of each other. Pretty much all of the extra space in workspace/collection is in ingest-temp.

I'll ask our developers here to take a look at the workflows. A casual glance though does show that the cleanup operation is in there.

The cleanup operation in both our repository and the MH main one is instructing it to:

<configuration key="preserve-flavors">*/source,dublincore/*</configuration>

That might explain all the stuff in the mediapackage directory.

--
Jon

On 9/13/12 1:05 AM, Tobias Wunden wrote:
Hi Jonathan,

It is configured to be on the same disk.

Also yes some of them are hard links.  Namely workspace/mediapackage.

[root@worker-qa workspace]# du -hs *
11G     collection
296K    http_worker-qa.media.berkeley.edu
15G     mediapackage


There is 11G in workspace/collection that is not hard links.  The space 
calculation in my previous message excluded the hard linked files (i.e. did not 
count them twice) from the overall calculation.

from my understanding, if there are files in the workspace that are not hard 
linked, this points to either one of a) configuration issue where workspace and 
working file repository are not linked appropriately or b) manual 
interventation. When checking the configuration, make sure that *all* of your 
machines in the cluster use *the same* url for the working file repository. So 
no default value like localhost:8008 but a url pointing to *one* of your 
machines. This should be documented on the page that Chris pointed you at 
earlier. If that documentation isn't clear enough, please consider patching it.

No there is not a cleanup routine defined in the workflow...at least I don't 
think so.  I'm using the default workflows that ship with matterhorn.  Problem 
is I don't know what to clean up.  Also how do I add a clean up routine?

The default workflow does ship with a "cleanup" operation at the very end, so 
you may want to do a checkout from official Matterhorn SVN, you may be holding on to a 
modified copy in the UC Berkeley msub.

Also does downloads contain copies of the original data plus the encoded data?

No, just the distribution versions + metadata needed by Engage.

Tobias

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to