Re: [galaxy-user] Metagenomic filtering

Jennifer Jackson Tue, 24 Sep 2013 11:58:22 -0700

Hi all -

Not to derail the conversation, but I wanted to point out some Galaxyresources that may help when considering how to approach solution. Thesemay be knowns, but thought I'd put them out there just in case. See below.

Best!
Jen
Galaxy team

There are at least three public Galaxy instances that focus heavily onMetagenomics. Maybe worth a look?

http://wiki.galaxyproject.org/PublicGalaxyServers

Just do a browser search on "metagenomics" to find on page. May beothers, but these are top 3.

The Tool Shed may or may not contain specialized tools from theseservers. Asking to have those tools made available via TS route is canbe done through direct contact. Other repos may have tools that fit orcould be tuned. Tool authors own tools - changes could potentially beincorporated through direct contact. Or, as is open source, used asbaseline with attribution if that doesn't work out.

http://toolshed.g2.bx.psu.edu/

Making a Galaxy Trello ticket for new tools and discussing new tooldevelopment on the [email protected] list may help you find otherGalaxy community developers working on similar projects. Tickets are notjust for the Galaxy core team, and even though the issue to solve isscientific, a technical implementation seems to be where this is going(new tool or existing tool tuning).http://wiki.galaxyproject.org/Issues -> Inbox is where this would go.Final home almost certainly Tool Shed (same for all tools), butpossibility of also including on Galaxy Main server also exists oncethere are a valid repo and it is determined to be a good fit (resource,etc.).


On 9/24/13 7:17 AM, Scott Tighe wrote:

Jing et al
Thank you for the offer to write some code to help advance themetagenomics arena. It is certainly needed.
So the problem is well known with megablast and shotgun metagenomicsand without proper understanding and correct software will yield verymisleading and in many cases incorrect data. For those of us who wishNOT to move to a protein level of comparison for specific reasons, weare stuck.
*The Problem:*
If I megablast 50 million sequences from a HiSeq run, millions of rRNAsequences will have a 99% match to all microbes rRNA genbank deposits.Not surprizing since the rRNA is highly conserved. The differencebetween E.coli and Shigella is 1 to 2 bases for the full 1540 bp 16s.So 16s is not useful for Genus level, and certainly not Species
*So what happens:*
The returned matches will have many hits to whatever model organism isin Genbank. For example E coli has 13000 entries for rRNA andSphearotilus has 3 entries for rRNA. If the blasted sequence matchesboth, the results will mislead the investigator to think they have13000 hits to E coli, EVEN if the microbe is Sphearotilus.
*The cure?:*
If there was a way to filter/ remove all hits ? Let say, for example,that a result has a first match (say E. coli) at >99% a second match(say Pseudomanas) at >99% and a third , forth and fifth match >99 forthree other organisms. This sequence _must_ be discarded because it isa conserve sequence.
Basically conserved sequence is the enemy and invalidates the entireresult.
*
**Another problem:*
If you have a reference sample with 19 non-model microbes, and yourun that by HiSeq Shotgun for metagenomics and then megablast, what doyou think you get? If E coli is not in the reference sample, how manyhits do you think you get? Yes, 10,000 of thousands. So withoutremoving conserved sequences, your data is wrong and you are muchbetter served by culturing and running a Biolog metabolic panel andcomparing to the sequence result.
So where do we start? I have some shotgun metagenomics data from thereference sample which included the 19 microbes. That was data from aMiSeq.
Scott
Scott Tighe
Senior Core Laboratory Research Staff
Advanced Genome Technologies Core
University of Vermont
Vermont Cancer Center
149 Beaumont ave
Health Science Research Facility 303/305
Burlington Vermont 05405
802-656-2557
On 9/20/2013 9:17 PM, Jing Yu wrote:
Hi Scott,
I can do some perl programming, such as local/remote blasting. Canyou specify your problem a little bit clearer, so that maybe I canwrite a program to do just that?
Regards,
Jing




Gerald
16s is basically useless for identification to genus. Since Istarted sequencing 16s in 1992, I have come to realize that withoutsequencing the full 1540 bases, it is generally misleading, and eventhan, it is not accurate enough to nail genus on more than 1/2 thecases. However, what is your feeling on ITS and gyrase, They seemto be far more discriminating but those databases have beendecommissioned sometime ago.
The desirable thing would be that Galaxy or NCBI add a "filterconserved genes" [ ie any hit with a second choice greater than 3%distance]. Something such as that.
If you (or others) are aware of such a thing, I'd love the hereabout it.
Sincerely
Scott
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Metagenomic filtering

Reply via email to