Hello again,

first of all thanks for your help, it is being very useful.

What I have done up to now is to copy this method to the class Sequence

def get_split_commands_sequential(is_compressed, input_name, output_name,
start_sequence, sequence_count):
        """
        Does a brain-dead sequential scan & extract of certain sequences
        >>> Sequence.get_split_commands_sequential(True, './input.gz',
'./output.gz', start_sequence=0, sequence_count=10)
        ['zcat "./input.gz" | ( tail -n +1 2> /dev/null) | head -40 | gzip
-c > "./output.gz"']
        >>> Sequence.get_split_commands_sequential(False, './input.fastq',
'./output.fastq', start_sequence=10, sequence_count=10)
        ['tail -n +41 "./input.fastq" 2> /dev/null | head -40 >
"./output.fastq"']
        """
        start_line = start_sequence * 4
        line_count = sequence_count * 4
        # TODO: verify that tail can handle 64-bit numbers
        if is_compressed:
            cmd = 'zcat "%s" | ( tail -n +%s 2> /dev/null) | head -%s |
gzip -c' % (input_name, start_line+1, line_count)
        else:
            cmd = 'tail -n +%s "%s" 2> /dev/null | head -%s'  %
(start_line+1, input_name, line_count)
        cmd += ' > "%s"' % output_name

        return [cmd]
    get_split_commands_sequential =
staticmethod(get_split_commands_sequential)

This is something that you suggested.
When I run the tool with this configuration:

<tool id="bwa_mio" name="map with bwa">
  <description>map with bwa</description>
  <parallelism method="basic" split_size="3"
split_mode="number_of_parts"></parallelism>

  <command>
      bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa
$input > $output 2>/dev/null</command>
  <inputs>
    <param format="fastqsanger" name="input" type="data" label="fastq"/>
  </inputs>
  <outputs>
      <data format="sam" name="output" />
  </outputs>

  <help>
  bwa
  </help>

</tool>
Everything ends ok, but when I go to check how is the sam, I see that in
the alingments it is the path of the file, i.e
example_split.sam:
/home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446
4 * 0 0 * * 0 0
TCTGGGTGAGGGAGTGGGGAGTGGGTTTTTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT
############################################################################
AS:i:0 XS:i:0

you know what  may be going on?
If i don't split the file, everything goes correctly.

Best regards


On 13 February 2015 at 13:39, Peter Cock <p.j.a.c...@googlemail.com> wrote:

> On Fri, Feb 13, 2015 at 11:38 AM, Nicola Soranzo <nsora...@tiscali.it>
> wrote:
> > Il 13.02.2015 03:17 Peter Cock ha scritto:
> >>
> >> Hi Roberto,
> >>
> >> It looks like this is a known issue with FASTQ splitting,
> >>
> >> https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism
> >>
> >> I originally broke it during a refactor, but it looks like the
> >> discussion died about that that method was meant to do
> >> (e.g. FQTOC = FASTQ table of contents?):
> >>
> >>
> >>
> https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4059dd7cde6fd2dc6#comment-820648
> >>
> >> I'm away from the office so can't try this, but probably all
> >> that is needed is to copy and paste the old method
> >> get_split_commands_sequential and the old method
> >> get_split_commands_with_toc (removed from the
> >> base Sequence class in the above commit) into the
> >> base Fastq class instead.
> >>
> >> Nicola - did you fix this locally after noticing the
> >> problem last year?
> >
> > No, sorry, we disabled Galaxy parallelism because it was using
> > too many cluster nodes.
> >
> > Nicola
>
> I had similar comments from some of the cluster users
> after getting it working here - but on balance a well used
> cluster helps justify future investment in maintaining it.
>
> Sorry about not following up on this - I think I might have
> assumed you would take care of it. Unfortunately I won't
> be able to test the obvious fix until at least a week later...
>
> Peter
>



-- 
Roberto Alonso
Functional Genomics Unit
Bioinformatics and Genomics Department
Prince Felipe Research Center (CIPF)
C./Eduardo Primo Yúfera (Científic), nº 3
(junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to