Clarifying question, > May I please ask everyone who is participating in this discussion to > read [1], thanks! In short, the episode service aka "Archive" is a
I didn't realise archive service == episode service. Reading the link, I dislike that the two terms are used interchangeably. I understand now that the service handles both workflow management, including the updating of and maintaining of canonical media packages. > *persistant* storage for *mediapackages and their elements*. As soon What do you mean by elements!? You mean media packages and their related artifacts? Like the XML file of the media package and the .mp4 files of tracks? Or do you mean something about elements inside of a media package XML file? > as we archive implementation is ready, there is no need to keep > anything in the working file repository anymore and the cleanup You mean no persistent need to keep things in the WFR, right? The worker nodes are still using the WFR for temporary "working" space, correct? E.g. the WFR isn't going to cease to exist. > operation can indeed clean up everything from that place. The archive > operation will take a similar configuration key that allows the > workflow author to define what is going into the archive and what > isn't. In short, the following will be true for the archive: > > 1) The archive only stores what it is told to store (any combination > of source file, intermediary files, distribution artifacts or nothing > at all) > > 2) The archived mediapackage will by default not reference any > elements outside of the archive (unless the workflow author chooses > to store intermediare files and distribution artifacts, too). > > In general, all these services only store files if they are told to > do so, so those that can't see a value in service decoupling and Ok, so when this is done, there will be no calls to an archive outside of the episode service since the two are the same thing, and I can just delete calls to this service since we archive outside of the process at the moment, right? Nothing would break with this model? Or are the workflows that the capture agent wants to start run from the episode service? E.g. would I go capture -> put in episode -> run workflow -> remove from episode? (in order to keep my own processing flow) > > An archive service the way I envision it pushes raw media packages > > away from MH. E.g. to reprocess them you need to pull them out of > > the archive. Does someone want to drop the link to the wiki space > > that describes plans for the archive service (I know there was > > something floated by Tobias previously). > > The data can't be moved away from Matterhorn, since then there is no > way to pull it back for reprocessing reasons. The archive > implementation will move the mediapackage xml and its elements to a > directory on the file system (or any other backend that one is able > to implement), which may be implemented as hierarchical storage or > even tapes, but that will be transparent to Matterhorn, which will > think of the file system as an asynchronous (at least in the final > version) online store. This is fine. The organization of this directory structure is one I'd be happy to be involved with, since how it relates to tapes is maybe important (selective restoring of a tree of data, for instance, then seeing specific things show up in the episode/archive ui). > I would second that. As important however is the fact that the > archive is a place where a system admin via the shell or mounted > filesystem gets file system based access to the full mediapackage, > meaning you can wrap it (tar or zip) and move it to other systems, > backup devices etc. We can make life easier by setting the directory structure as above, and I would encourage this. The worst thing we could do would be to obscure this all within either the DB or SOLR, since pushing to tape would be more difficult. Also, I don't see on the link how I important a media package (from a new capture agent, or from another university, etc. etc.) into the episode/archive service. I would like to do this and this might let me relax the requirement on the directory structure (since I could just archive our stuff to tape the way we already are, then import it again into the episode service to run more operations). > In general I find it hugely important to acknowledge that there *are* > institutions with archival concerns, and others without. Again, if I think most of us have them, but the details of the concerns differ. Chris -- Christopher Brooks, BSc, MSc ARIES Laboratory, University of Saskatchewan Web: http://www.cs.usask.ca/~cab938 Phone: 1.306.966.1442 Mail: Advanced Research in Intelligent Educational Systems Laboratory Department of Computer Science University of Saskatchewan 176 Thorvaldson Building 110 Science Place Saskatoon, SK S7N 5C9 _______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
