>> Another copy of the files? So now we've got the distribution copies,
>> in its own mediapackage; the working file repository copy, in its own
>> mediapackage too; the episode service copy, also in its own
>> mediapackage. And we are going to add *yet another* copy for the
>> archive service. So that makes four *different* mediapackages
>> refering to the very same recording! No wonder why it's so difficult
>> to keep track of the files.
> 
> Knowing nothing of the episode service I must ask, does it have its own
> copy of media package resources outside of the WFR as Ruben suggests?

May I please ask everyone who is participating in this discussion to read [1], 
thanks! In short, the episode service aka "Archive" is a *persistant* storage 
for *mediapackages and their elements*. As soon as we archive implementation is 
ready, there is no need to keep anything in the working file repository anymore 
and the cleanup operation can indeed clean up everything from that place. The 
archive operation will take a similar configuration key that allows the 
workflow author to define what is going into the archive and what isn't. In 
short, the following will be true for the archive:

1) The archive only stores what it is told to store (any combination of source 
file, intermediary files, distribution artifacts or nothing at all)

2) The archived mediapackage will by default not reference any elements outside 
of the archive (unless the workflow author chooses to store intermediare files 
and distribution artifacts, too).

In general, all these services only store files if they are told to do so, so 
those that can't see a value in service decoupling and keeping certain data in 
places that make sense in order to secure investments into the produced media 
long term can simply remove the calls to the archive from the workflow and be 
done with it.

> An archive service the way I envision it pushes raw media packages away
> from MH.  E.g. to reprocess them you need to pull them out of the
> archive.  Does someone want to drop the link to the wiki space that
> describes plans for the archive service (I know there was something
> floated by Tobias previously).

The data can't be moved away from Matterhorn, since then there is no way to 
pull it back for reprocessing reasons. The archive implementation will move the 
mediapackage xml and its elements to a directory on the file system (or any 
other backend that one is able to implement), which may be implemented as 
hierarchical storage or even tapes, but that will be transparent to Matterhorn, 
which will think of the file system as an asynchronous (at least in the final 
version) online store.

>> sufficient? Why don't we keep things in synch with the "episode"
>> copy? I mean, what's the point of "archive" being an optional
>> operation you can include or not in your workflow? How wouldn't
>> anyone want to keep their recordings archived and ready for being
>> processed again for whatever reason which may come in the future?
> 
> To me archive means slow storage, and everything else means fast
> storage.  So archive is tape storage, or cheap but low performance disk
> arrays, while other things are on a high speed SAN with high
> availability.

I would second that. As important however is the fact that the archive is a 
place where a system admin via the shell or mounted filesystem gets file system 
based access to the full mediapackage, meaning you can wrap it (tar or zip) and 
move it to other systems, backup devices etc.

In general I find it hugely important to acknowledge that there *are* 
institutions with archival concerns, and others without. Again, if archival 
copies are a waste of disk space to you, don't add the archive operation to the 
workflows and make sure the cleanup operations removes all your files past 
distribution. But let those live as well that consider storing their media for 
further processing.

Tobias
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to