Hi Scott,

Following some failing hard drives, I'm rebuilding our Galaxy server.
Something isn't quite right with our cluster integration yet, but it has
exposed a problem in Galaxy's handling of task splitting - it can
sometimes attempt to merge zero files.

Here is my fix for the BLAST XML format (now in the ToolShed),
https://bitbucket.org/peterjc/galaxy-central/changeset/5cb6411bad19802ba4001a083164366b42850a48

Here's an example using the text format:

galaxy.jobs.splitters.multi ERROR 2012-10-18 16:26:21,330 Error merging files
Traceback (most recent call last):
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/splitters/multi.py",
line 133, in do_merge
    output_type.merge(output_files, output_file_name)
  File "/mnt/galaxy/galaxy-central/lib/galaxy/datatypes/data.py", line
545, in merge
    raise Exception('Result %s from %s' % (result, cmd))
Exception: Result 2 from cat  >
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat

The problem obviously is that while "cat file1 ... fileN > merged" will
work fine for one or more files, with no files it sits waiting for stdin
(and from a user perspective stalls).

This logic error is in lib/galaxy/datatypes/data.py method merge,
which could either treat zero files as an error, or a no-op:

        if len(split_files) == 1:
            cmd = 'mv -f %s %s' % ( split_files[0], output_file )
        else:
            cmd = 'cat %s > %s' % ( ' '.join(split_files), output_file )
        result = os.system(cmd)

I think this should be something like this:

        if not split_files:
            raise Exception('Asked to merge zero files')
        elif len(split_files) == 1:
            cmd = 'mv -f %s %s' % ( split_files[0], output_file )
        else:
            cmd = 'cat %s > %s' % ( ' '.join(split_files), output_file )
        result = os.system(cmd)

It might also make sense to check for zero files in the code which
calls the merge, i.e. lib/galaxy/jobs/splitters/multi.py function do_merge
I'm still investigating upstream how this comes about, one clue:

galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:01,930 (273/510)
state change: job is running
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,040 (273/510)
state change: job finished, but failed
galaxy.jobs.runners.drmaa DEBUG 2012-10-18 16:25:03,074 Job output not
returned from cluster
galaxy.jobs DEBUG 2012-10-18 16:25:03,074 task 641 for job 273 ended;
exit code: 0
galaxy.jobs DEBUG 2012-10-18 16:25:03,148 task 641 ended
galaxy.jobs.runners.tasks DEBUG 2012-10-18 16:25:05,169 execution
finished - beginning merge: tblastx -query
"/mnt/galaxy/galaxy-central/database/files/000/dataset_127.dat"   -db
"/var/local/blast/ncbi/nt" -query_gencode 2 -evalue 0.001 -out
/mnt/galaxy/galaxy-central/database/files/000/dataset_304.dat
-outfmt 0 -num_threads 8
galaxy.jobs.splitters.multi DEBUG 2012-10-18 16:25:05,181 files []

If you would prefer that small suggestion as a pull request, let me know.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to