Re: [Opencast Matterhorn] Cleanup operation by default

Christopher Brooks Wed, 20 Jun 2012 14:19:50 -0700

Clarifying question,

> May I please ask everyone who is participating in this discussion to
> read [1], thanks! In short, the episode service aka "Archive" is a


I didn't realise archive service == episode service.  Reading the link,
I dislike that the two terms are used interchangeably.  I understand
now that the service handles both workflow management, including the
updating of and maintaining of canonical media packages.

> *persistant* storage for *mediapackages and their elements*. As soon

What do you mean by elements!?  You mean media packages and their
related artifacts?  Like the XML file of the media package and the .mp4
files of tracks?  Or do you mean something about elements inside of a
media package XML file?

> as we archive implementation is ready, there is no need to keep
> anything in the working file repository anymore and the cleanup

You mean no persistent need to keep things in the WFR, right?  The
worker nodes are still using the WFR for temporary "working" space,
correct?  E.g. the WFR isn't going to cease to exist.

> operation can indeed clean up everything from that place. The archive
> operation will take a similar configuration key that allows the
> workflow author to define what is going into the archive and what
> isn't. In short, the following will be true for the archive:
> 
> 1) The archive only stores what it is told to store (any combination
> of source file, intermediary files, distribution artifacts or nothing
> at all)
> 
> 2) The archived mediapackage will by default not reference any
> elements outside of the archive (unless the workflow author chooses
> to store intermediare files and distribution artifacts, too).
> 
> In general, all these services only store files if they are told to
> do so, so those that can't see a value in service decoupling and

Ok, so when this is done, there will be no calls to an archive outside
of the episode service since the two are the same thing, and I can just
delete calls to this service since we archive outside of the process at
the moment, right?  Nothing would break with this model?  Or are the
workflows that the capture agent wants to start run from the episode
service?  E.g. would I go capture -> put in episode -> run workflow ->
remove from episode?  (in order to keep my own processing flow)

> > An archive service the way I envision it pushes raw media packages
> > away from MH.  E.g. to reprocess them you need to pull them out of
> > the archive.  Does someone want to drop the link to the wiki space
> > that describes plans for the archive service (I know there was
> > something floated by Tobias previously).
> 
> The data can't be moved away from Matterhorn, since then there is no
> way to pull it back for reprocessing reasons. The archive
> implementation will move the mediapackage xml and its elements to a
> directory on the file system (or any other backend that one is able
> to implement), which may be implemented as hierarchical storage or
> even tapes, but that will be transparent to Matterhorn, which will
> think of the file system as an asynchronous (at least in the final
> version) online store.

This is fine.  The organization of this directory structure is one I'd
be happy to be involved with, since how it relates to tapes is maybe
important (selective restoring of a tree of data, for instance, then
seeing specific things show up in the episode/archive ui).

> I would second that. As important however is the fact that the
> archive is a place where a system admin via the shell or mounted
> filesystem gets file system based access to the full mediapackage,
> meaning you can wrap it (tar or zip) and move it to other systems,
> backup devices etc.

We can make life easier by setting the directory structure as above,
and I would encourage this.  The worst thing we could do would be to
obscure this all within either the DB or SOLR, since pushing to tape
would be more difficult.

Also, I don't see on the link how I important a media package (from a
new capture agent, or from another university, etc. etc.) into the
episode/archive service.  I would like to do this and this might let me
relax the requirement on the directory structure (since I could just
archive our stuff to tape the way we already are, then import it again
into the episode service to run more operations).

> In general I find it hugely important to acknowledge that there *are*
> institutions with archival concerns, and others without. Again, if

I think most of us have them, but the details of the concerns differ.

Chris
-- 
Christopher Brooks, BSc, MSc
ARIES Laboratory, University of Saskatchewan

Web: http://www.cs.usask.ca/~cab938
Phone: 1.306.966.1442
Mail: Advanced Research in Intelligent Educational Systems Laboratory
     Department of Computer Science
     University of Saskatchewan
     176 Thorvaldson Building
     110 Science Place
     Saskatoon, SK
     S7N 5C9
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Re: [Opencast Matterhorn] Cleanup operation by default

Reply via email to