Hi Scott,
The tool "Metagenomic analyses -> Find diagnostic hits" can be used to
isolate the conserved sequences. Then, you use the tool "Join, Subtract
and Group -> Compare" to find "Non Matching rows of 1st dataset" to
filter out anything that you think is spurious for your analysis (put in
original file first, output of diagnostic hits second) before moving
forward with the other summary tools.
You will probably want to run the "Find diagnostic hits" tool more than
once. The choice is yours whether to do the "Compare" after each, or to
"Text Manipulation -> Concatenate" all the results together first, then
"Compare". The first might work faster, it just depends on the size of
your datasets (how much filtering occurred before this step, etc).
The "Compare" tool sorts and holds data in memory. Even if you need to
break the data up and run in smaller chunks, the results should be the
same in the end. None of these jobs require the data to be in one lump.
Others are welcome to add to this with their own strategies, I am sure
there are others ways to do this. Some of the public servers
specializing in Metagenomics may also have tools for this, or options,
and some of those may have donated to the Tool Shed, for local or cloud
use. May be worth a look.
http://wiki.galaxyproject.org/PublicGalaxyServers
Good question!
Jen
Galaxy team
On 9/18/13 7:03 AM, Scott W. Tighe wrote:
Dear Galaxy
When running HiSeq shot metagenomics sample from the environment
against megablast and taxonomic representation, How do I filter/remove
all the 16s and other conserved sequences.
The problem if blasting a single organism that has a fraction of
conserved sequence, the results will align with E.coli 10,000 times
more then the possible target organism. This data would be wrong and
misleading. For example a 100mg sample that was negative for e coli
using MUG test, give thousands of hits with galaxy.
1) Is there a "filter conserved sequences" setting?
2) Is there a "remove model organisms" setting?
Scott Tighe
--
Jennifer Hillman-Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/