Hi Ruben, On 20.06.2012, at 11:41, Rubén Pérez <[email protected]> wrote:
> I don't quite follow. How are those mediapackages invalid? When mediapackage > elements are cleaned up, they are effectively taken out from the manifests > and so on, so I don't know why there should be "pointers to files and > catalogs that do not exist anymore". depending on where you put the cleanup operation in your workflow, you are right and there are no dangling pointers. This is the case if you put that operation before "archive" and "publish". If you put it after either of these two (which currently is the case in the workflows we ship), the mediapackages stored in either one of these two systems will be invalid because they are now pointing to files that are no longer there. > I'm particularly against keeping intermediate "work" files, which are created > in the middle steps of the workflow but never get distributed. Those files > are a consequence of the specific implementation of the workflow and, should > another workflow be run, they should be re-created as needed. After the > workflow ends, in my view they're just garbage (which accounts for the fact > that the name given to the operation that gets rid of those files is > "cleanup"). I agree. > If somehow the cleanup operation is not correctly deleting all the references > to the deleted files, then the question is not skipping the cleanup operation > alltogether, because the need to save disk space is still there, and it's > critical in most cases. The right way to go is fixing the cleanup operation, > or whichever processes that are failing to update the broken references. If > the problem is that the distributed files are not kept, then it's a question > of changing the default workflow and tell the "cleanup" operation to keep > those files also. Saving disk space becomes important as you are moving from "let's take a look at Matterhorn" to a production environment. At that point, you'll also have some more insight into what the workflows do, where they are putting files and what implications it may have to remove some of them as part of the cleanup operation. This is why I am suggesting to keep these files by default and let people make a conscious decision on whether to throw them away or not instead of throwing them away in the first place and hoping that people make conscious decisions on keeping it. If your system is running out of disk space, you will start thinking about strategies to overcome this issue. If your data is gone by default, there is not a lot you can do in hindsight... > As an adopter institution, the disk space consumed by Matterhorn (by our > media content in general) is a critical issue. I won't vote on this until > knowing about those "broken references" better, but the cleanup operation > makes the disk comsumption more efficient, and in general I'm against > removing it completely from the default workflow. It is a critical issue, there is no doubt about it. But the "out of the box" experience of Matterhorn is critical, and by throwing stuff away during cleanup in an inconsistent way, we are more or less makeing sure that people will be running into errors when using the episode ui to do reprocessing of any kind or even simple retractions. Tobias _______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
