Chris, > Clarifying question, > >> May I please ask everyone who is participating in this discussion to >> read [1], thanks! In short, the episode service aka "Archive" is a > > I didn't realise archive service == episode service. Reading the link, > I dislike that the two terms are used interchangeably. I understand > now that the service handles both workflow management, including the > updating of and maintaining of canonical media packages.
the episode service is turning into a front end for the archive. we did not have the time and resources to get everything done at once, therefore the iterative approach of first creating a central place to store mediapackages (without their elements) and only now implement the actual archival functionality with support for persisting the full mediapackages, versioning, asynchronous access etc. >> *persistant* storage for *mediapackages and their elements*. As soon > > What do you mean by elements!? You mean media packages and their > related artifacts? Like the XML file of the media package and the .mp4 > files of tracks? Or do you mean something about elements inside of a > media package XML file? mediapackages (the xml) and its elements (catalogs, tracks, attachments). >> as we archive implementation is ready, there is no need to keep >> anything in the working file repository anymore and the cleanup > > You mean no persistent need to keep things in the WFR, right? The > worker nodes are still using the WFR for temporary "working" space, > correct? E.g. the WFR isn't going to cease to exist. yes. the working file repository will still be the working storage during processing, but in theory, a workflow's *temporary* processing artifacts should be gone from it once it has finished. >> operation can indeed clean up everything from that place. The archive >> operation will take a similar configuration key that allows the >> workflow author to define what is going into the archive and what >> isn't. In short, the following will be true for the archive: >> >> 1) The archive only stores what it is told to store (any combination >> of source file, intermediary files, distribution artifacts or nothing >> at all) >> >> 2) The archived mediapackage will by default not reference any >> elements outside of the archive (unless the workflow author chooses >> to store intermediare files and distribution artifacts, too). >> >> In general, all these services only store files if they are told to >> do so, so those that can't see a value in service decoupling and > > Ok, so when this is done, there will be no calls to an archive outside > of the episode service since the two are the same thing, and I can just > delete calls to this service since we archive outside of the process at > the moment, right? Right. That's the beauty of Matterhorn's loose coupling of services through the workflow definitions only (with a small number of exceptions, of course). > Nothing would break with this model? Or are the > workflows that the capture agent wants to start run from the episode > service? E.g. would I go capture -> put in episode -> run workflow -> > remove from episode? (in order to keep my own processing flow) I don't see why the capture agent would want to do that. Workflows that involve the capture agent either originate from the scheduling service or from ingest directly, and there is no need to involve the episode service. >> The data can't be moved away from Matterhorn, since then there is no >> way to pull it back for reprocessing reasons. The archive >> implementation will move the mediapackage xml and its elements to a >> directory on the file system (or any other backend that one is able >> to implement), which may be implemented as hierarchical storage or >> even tapes, but that will be transparent to Matterhorn, which will >> think of the file system as an asynchronous (at least in the final >> version) online store. > > This is fine. The organization of this directory structure is one I'd > be happy to be involved with, since how it relates to tapes is maybe > important (selective restoring of a tree of data, for instance, then > seeing specific things show up in the episode/archive ui). That's ok, just start sending your thoughts to list and make sure Christoph is aware of them, as he is currently working on implementing a first version. In addition, the actual archival will be a replacable piece, which means you will be able to put your own implementation in place if you don't like the way the (for now) default implementation does it. >> I would second that. As important however is the fact that the >> archive is a place where a system admin via the shell or mounted >> filesystem gets file system based access to the full mediapackage, >> meaning you can wrap it (tar or zip) and move it to other systems, >> backup devices etc. > > We can make life easier by setting the directory structure as above, > and I would encourage this. The worst thing we could do would be to > obscure this all within either the DB or SOLR, since pushing to tape > would be more difficult. +100 > Also, I don't see on the link how I important a media package (from a > new capture agent, or from another university, etc. etc.) into the > episode/archive service. I would like to do this and this might let me > relax the requirement on the directory structure (since I could just > archive our stuff to tape the way we already are, then import it again > into the episode service to run more operations). Currently, I am not aware of any requirements in that direction. Importing of a mediapackage from other systems or capture agents is done in an easy way by configuring an inbox where you just drop the mediapackages and configure a workflow that sticks them into the archive or does some other processing (or both). However, importing through the archive does make sense from my point of view. Tobias _______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
