On Feb 16, 2012, at 5:15 AM, Peter Cock wrote:

> On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker <dannonba...@me.com> wrote:
>> Main still runs these jobs in the standard non-split fashion, and as a
>> resource that is occasionally saturated (and thus doesn't necessarily have
>> extra resources to parallelize to) will probably continue doing so as long
>> as there's significant overhead involved in splitting the files.  Fancy
>> scheduling could minimize the issue, but as it is during heavy load you
>> would actually have lower total throughput due to the splitting overhead.
> Because the splitting (currently) happens on the main server?

No, because the splitting process is work which has to happen somewhere.  
Ignoring possible benefits from things that haven't been implemented yet, in a 
situation where your cluster is saturated with work you are unable to take 
advantage of the parallelism and splitting files apart is only adding more 
work, reducing total job throughput.  That splitting always happens on the head 
node is not ideal, and needs to be configurable.  I have a fork somewhere that 
attempts to address this but it needs work.
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to