Hi Ruben,

On 20.06.2012, at 11:41, Rubén Pérez <[email protected]> wrote:

> I don't quite follow. How are those mediapackages invalid? When mediapackage 
> elements are cleaned up, they are effectively taken out from the manifests 
> and so on, so I don't know why there should be "pointers to files and 
> catalogs that do not exist anymore". 

depending on where you put the cleanup operation in your workflow, you are 
right and there are no dangling pointers. This is the case if you put that 
operation before "archive" and "publish". If you put it after either of these 
two (which currently is the case in the workflows we ship), the mediapackages 
stored in either one of these two systems will be invalid because they are now 
pointing to files that are no longer there.

> I'm particularly against keeping intermediate "work" files, which are created 
> in the middle steps of the workflow but never get distributed. Those files 
> are a consequence of the specific implementation of the workflow and, should 
> another workflow be run, they should be re-created as needed. After the 
> workflow ends, in my view they're just garbage (which accounts for the fact 
> that the name given to the operation that gets rid of those files is 
> "cleanup").

I agree.

> If somehow the cleanup operation is not correctly deleting all the references 
> to the deleted files, then the question is not skipping the cleanup operation 
> alltogether, because the need to save disk space is still there, and it's 
> critical in most cases. The right way to go is fixing the cleanup operation, 
> or whichever processes that are failing to update the broken references. If 
> the problem is that the distributed files are not kept, then it's a question 
> of changing the default workflow and tell the "cleanup" operation to keep 
> those files also.

Saving disk space becomes important as you are moving from "let's take a look 
at Matterhorn" to a production environment. At that point, you'll also have 
some more insight into what the workflows do, where they are putting files and 
what implications it may have to remove some of them as part of the cleanup 
operation. This is why I am suggesting to keep these files by default and let 
people make a conscious decision on whether to throw them away or not instead 
of throwing them away in the first place and hoping that people make conscious 
decisions on keeping it.

If your system is running out of disk space, you will start thinking about 
strategies to overcome this issue. If your data is gone by default, there is 
not a lot you can do in hindsight...

> As an adopter institution, the disk space consumed by Matterhorn (by our 
> media content in general) is a critical issue. I won't vote on this until 
> knowing about those "broken references" better, but the cleanup operation 
> makes the disk comsumption more efficient, and in general I'm against 
> removing it completely from the default workflow.

It is a critical issue, there is no doubt about it. But the "out of the box" 
experience of Matterhorn is critical, and by throwing stuff away during cleanup 
in an inconsistent way, we are more or less makeing sure that people will be 
running into errors when using the episode ui to do reprocessing of any kind or 
even simple retractions.

Tobias 
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to