>> Another copy of the files? So now we've got the distribution copies, >> in its own mediapackage; the working file repository copy, in its own >> mediapackage too; the episode service copy, also in its own >> mediapackage. And we are going to add *yet another* copy for the >> archive service. So that makes four *different* mediapackages >> refering to the very same recording! No wonder why it's so difficult >> to keep track of the files. > > Knowing nothing of the episode service I must ask, does it have its own > copy of media package resources outside of the WFR as Ruben suggests?
May I please ask everyone who is participating in this discussion to read [1], thanks! In short, the episode service aka "Archive" is a *persistant* storage for *mediapackages and their elements*. As soon as we archive implementation is ready, there is no need to keep anything in the working file repository anymore and the cleanup operation can indeed clean up everything from that place. The archive operation will take a similar configuration key that allows the workflow author to define what is going into the archive and what isn't. In short, the following will be true for the archive: 1) The archive only stores what it is told to store (any combination of source file, intermediary files, distribution artifacts or nothing at all) 2) The archived mediapackage will by default not reference any elements outside of the archive (unless the workflow author chooses to store intermediare files and distribution artifacts, too). In general, all these services only store files if they are told to do so, so those that can't see a value in service decoupling and keeping certain data in places that make sense in order to secure investments into the produced media long term can simply remove the calls to the archive from the workflow and be done with it. > An archive service the way I envision it pushes raw media packages away > from MH. E.g. to reprocess them you need to pull them out of the > archive. Does someone want to drop the link to the wiki space that > describes plans for the archive service (I know there was something > floated by Tobias previously). The data can't be moved away from Matterhorn, since then there is no way to pull it back for reprocessing reasons. The archive implementation will move the mediapackage xml and its elements to a directory on the file system (or any other backend that one is able to implement), which may be implemented as hierarchical storage or even tapes, but that will be transparent to Matterhorn, which will think of the file system as an asynchronous (at least in the final version) online store. >> sufficient? Why don't we keep things in synch with the "episode" >> copy? I mean, what's the point of "archive" being an optional >> operation you can include or not in your workflow? How wouldn't >> anyone want to keep their recordings archived and ready for being >> processed again for whatever reason which may come in the future? > > To me archive means slow storage, and everything else means fast > storage. So archive is tape storage, or cheap but low performance disk > arrays, while other things are on a high speed SAN with high > availability. I would second that. As important however is the fact that the archive is a place where a system admin via the shell or mounted filesystem gets file system based access to the full mediapackage, meaning you can wrap it (tar or zip) and move it to other systems, backup devices etc. In general I find it hugely important to acknowledge that there *are* institutions with archival concerns, and others without. Again, if archival copies are a waste of disk space to you, don't add the archive operation to the workflows and make sure the cleanup operations removes all your files past distribution. But let those live as well that consider storing their media for further processing. Tobias _______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
