Hello,

Using the 'Subtract' tool between FASTQ datasets can be memory intensive since it literally involves sorting and then comparing each character between the two files. This is likely not necessary. I have seen queries such as yours run successfully on even very large datasets by eliminating the Subtract step and instead using a 'Select' with "NOT Matching' on the original dataset.


Example:

current dataflow:
1 - original file A
2 - select positive match expression 'X' to create file B
3 - subtract file B from file A to create file C

better:
1 - original file A
2 - select negative match expression 'X' to create file C

If this failure is on the public main Galaxy server and you do not wish to change your query, then moving to a cloud instance and experimenting with larger memory options is one suggestion: http://usegalaxy.org/cloud

Hopefully this helps,

Jen
Galaxy team

On 4/29/12 6:16 PM, Xianrong Wong wrote:
Hello, I am using the subtract (whole dataset) tool.  I converted my
fastq file to tabular with 2 columns:  1. Identifier and 2. sequence.  I
then "selected (a few) lines that match an expression" from this initial
tabular file and am trying to get a final dataset that is devoid of
reads with the few selected lines - thus I subtract the dataset of
selected lines from the initial dataset.  This tool works with I am
performing the workflow on a relatively small file (1/50 the size of a
whole sequencing experiment) but repeatly fails when I input the full
fastq file.  Any idea why this is so?
Jose


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

--
Jennifer Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to