Hello again, first of all thanks for your help, it is being very useful.
What I have done up to now is to copy this method to the class Sequence def get_split_commands_sequential(is_compressed, input_name, output_name, start_sequence, sequence_count): """ Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, './input.gz', './output.gz', start_sequence=0, sequence_count=10) ['zcat "./input.gz" | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > "./output.gz"'] >>> Sequence.get_split_commands_sequential(False, './input.fastq', './output.fastq', start_sequence=10, sequence_count=10) ['tail -n +41 "./input.fastq" 2> /dev/null | head -40 > "./output.fastq"'] """ start_line = start_sequence * 4 line_count = sequence_count * 4 # TODO: verify that tail can handle 64-bit numbers if is_compressed: cmd = 'zcat "%s" | ( tail -n +%s 2> /dev/null) | head -%s | gzip -c' % (input_name, start_line+1, line_count) else: cmd = 'tail -n +%s "%s" 2> /dev/null | head -%s' % (start_line+1, input_name, line_count) cmd += ' > "%s"' % output_name return [cmd] get_split_commands_sequential = staticmethod(get_split_commands_sequential) This is something that you suggested. When I run the tool with this configuration: <tool id="bwa_mio" name="map with bwa"> <description>map with bwa</description> <parallelism method="basic" split_size="3" split_mode="number_of_parts"></parallelism> <command> bwa mem /home/ralonso/BiB/Galaxy/data/Cclementina_v1.0_scaffolds.fa $input > $output 2>/dev/null</command> <inputs> <param format="fastqsanger" name="input" type="data" label="fastq"/> </inputs> <outputs> <data format="sam" name="output" /> </outputs> <help> bwa </help> </tool> Everything ends ok, but when I go to check how is the sam, I see that in the alingments it is the path of the file, i.e example_split.sam: /home/ralonso/galaxy-dist/database/job_working_directory/000/90/task_2/dataset_91.dat:SRR098409.1113446 4 * 0 0 * * 0 0 TCTGGGTGAGGGAGTGGGGAGTGGGTTTTTGAGGGTGTGTGAGGATGTGTAAGTGGATGGAAGTAGATTGAATGTT ############################################################################ AS:i:0 XS:i:0 you know what may be going on? If i don't split the file, everything goes correctly. Best regards On 13 February 2015 at 13:39, Peter Cock <p.j.a.c...@googlemail.com> wrote: > On Fri, Feb 13, 2015 at 11:38 AM, Nicola Soranzo <nsora...@tiscali.it> > wrote: > > Il 13.02.2015 03:17 Peter Cock ha scritto: > >> > >> Hi Roberto, > >> > >> It looks like this is a known issue with FASTQ splitting, > >> > >> https://trello.com/c/qRHLFSzd/1522-issues-with-tasked-jobs-parallelism > >> > >> I originally broke it during a refactor, but it looks like the > >> discussion died about that that method was meant to do > >> (e.g. FQTOC = FASTQ table of contents?): > >> > >> > >> > https://bitbucket.org/galaxy/galaxy-central/commits/76277761807306ec2be3f1e4059dd7cde6fd2dc6#comment-820648 > >> > >> I'm away from the office so can't try this, but probably all > >> that is needed is to copy and paste the old method > >> get_split_commands_sequential and the old method > >> get_split_commands_with_toc (removed from the > >> base Sequence class in the above commit) into the > >> base Fastq class instead. > >> > >> Nicola - did you fix this locally after noticing the > >> problem last year? > > > > No, sorry, we disabled Galaxy parallelism because it was using > > too many cluster nodes. > > > > Nicola > > I had similar comments from some of the cluster users > after getting it working here - but on balance a well used > cluster helps justify future investment in maintaining it. > > Sorry about not following up on this - I think I might have > assumed you would take care of it. Unfortunately I won't > be able to test the obvious fix until at least a week later... > > Peter > -- Roberto Alonso Functional Genomics Unit Bioinformatics and Genomics Department Prince Felipe Research Center (CIPF) C./Eduardo Primo Yúfera (Científic), nº 3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralo...@cipf.es
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/