Re: [Matterhorn-users] [ETS Operations] multiple worker machine and shared storage

Jonathan Felder Wed, 12 Sep 2012 13:12:01 -0700

Thanks for taking the time to answer my questions.

So are these settings all or nothing with regard to share storage?

Is there any performance to be gained by having the workers pull filesoff of shared shared storage onto local storage, perform the workflows,and then putting the results back onto shared storage? Is thisconfiguration even possible or desirable?

Also are you recommending different tiers of storage for the differentsettings? For example, should the workspace mounts be on a separatemount with faster storage?


--
Jon

On 9/12/12 12:17 PM, Jaime Gago wrote:

Hi Jon,
 From a bird's view all you need is your workers and admin hosts having access 
to the same shared storage (with the right permissions and properly configured 
of course), that storage, let's say an NFS mount needs to support hard links.

When Matterhorn initializes it will try to determine if the relevant storage supports 
hard links (it will show in the logs like this " INFO (WorkspaceImpl:163) - Hard 
links between the working file repository and the workspace enabled").

In a properly configured shared storage setup with hard links detected there is no 
copying/processing/copying back, it's all "in place".

Now when it comes to configuration these are the keys in config.properties that 
need to point at a shared/hard link enabled storage

# The path to the repository of files used during media processing.
org.opencastproject.file.repo.path=/nfs/mount/work/shared/files

# The path to the working files (recommend using fast, transient storage)
org.opencastproject.workspace.rootdir=/nfs/mount/work/shared/workspace

In addition

A couple of other keys should be pointing at a shared storage

# The directory where the matterhorn streaming app for Red5 stores the streams
org.opencastproject.streaming.directory=/nfs/mount/distribution/streams

# The directory to store media, metadata, and attachments for download from the 
engage tool
org.opencastproject.download.directory=/nfs/mount/distribution/downloads


All this and more was documented by Tobias Wunden (Entwine CTO) here

http://opencast.jira.com/wiki/display/MH/Sample+Distributed+Installation

http://opencast.jira.com/wiki/display/MH/SAMPLE+Customization

"[...]
The workspace directory (org.opencastproject.workspace.rootdir) will ideally be 
shared amongst the nodes of a system. Any time one of the Matterhorn services 
needs to work on a certain piece of media, the service will first download the 
file to the workspace and then start processing. Now as a recording travels 
through the system to be processed, each media track and metadata file will be 
touched multiple times by different services. If the workspace is shared, 
download occurs only once instead of multiple times.

The biggest performance gain can be achieved by putting both the working file 
repository's storage directory and the shared workspace on one single network 
volume. This means that there will be no downloading from the working file 
repository to the workspace but hard linking, which can be done in a blink of 
an eye.
[...]
"


*************
Jaime Gago
Systems Engineer
[email protected]
@JaimeGagoTech

Entwine - Knowledge In Motion
www.entwinemedia.com
@entwinemedia


On Sep 4, 2012, at 1:50 PM, Jonathan Felder wrote:

Do the workers pull the files off the nfs mount, do their thing, and then place 
the completed files back on the mount?  If not, how is the latency?  I'd expect 
there to be significant performance considerations after the number of workers 
increases.

How is all of this configured?

--
Jon

On 9/4/12 1:28 PM, Christopher Brooks wrote:

Jon,

I assume Adam can answer this in more detail from our end, but we do
multiple workers with a single admin.  The workers can request all of
the files from the admin and hand files back, but even better is to
have them all use the same shared storage.  Thus a request for a file
is dealt with by having the worker just look on the NFS mount instead
of looking at a REST endpoint.

Chris

On Tue, 4 Sep 2012 13:06:57 -0700
Jonathan Felder <[email protected]> wrote:

Has anyone tried a configuration using multiple worker machines?

How do you handle the file management?  Do all of the workers utilize
shared storage with the admin server or can the admin server hand
files to the workers and the workers send back completed workflows?

--
Jon
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Re: [Matterhorn-users] [ETS Operations] multiple worker machine and shared storage

Reply via email to