Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Peter Cock Thu, 16 Feb 2012 02:16:20 -0800

On Wed, Feb 15, 2012 at 6:07 PM, Dannon Baker <[email protected]> wrote:
>
> Main still runs these jobs in the standard non-split fashion, and as a
> resource that is occasionally saturated (and thus doesn't necessarily have
> extra resources to parallelize to) will probably continue doing so as long
> as there's significant overhead involved in splitting the files.  Fancy
> scheduling could minimize the issue, but as it is during heavy load you
> would actually have lower total throughput due to the splitting overhead.
>


Because the splitting (currently) happens on the main server?

>> Regarding the merging of the out, I see there is a default merge
>> method in lib/galaxy/datatypes/data.py which just concatenates
>> the files. I am surprised at that - it seems like a very bad idea in
>> general - consider many binary files, or XML. Why not put this
>> as the default for text and subclasses thereof?
>
> I can't think of a better reasonable default behavior for "Data", though
> you're obviously right that each datatype subclass will need to define
> particular behaviors for merging files.

The default should raise an error (and better yet, refuse to do the
split in the first place). Zen of Python: In the face of ambiguity,
refuse the temptation to guess.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Reply via email to