Re: [galaxy-user] Transcriptome Hypericum perforatum

Jennifer Jackson Mon, 25 Nov 2013 18:44:39 -0800

Hello,

Interesting genome. I see that SRA has some RNA-seq public data, butthere isn't much else going on. And you goal is to characterize theexpression for observed phenotypes (linked to known genotypes)? If youuse the Tuxedo suite after assembly (Trinity or other), differentialexpression of alternative splicing is one of the discovery outputs.

From my experience (and other are welcome to add comments), most SNPdifferences (_single_ base polymorphisms) do not in general impact theglobal assembly of whole genome data. Larger insertions/deletions arewhere you will observe differences. But that is DNA.

For transcription assembly, including RNA-seq, novel isoforms per sampleand in particular rare events like SNPs, can become diluted whenmultiple samples are directly combined and assembled together straightde-novo. Still, obtaining full length cDNAs is certainly possible. Andit has been done just about the same way, with various types of RNAdata, for a very long time (most of RefSeq started out that way). Thedownside here is that "the most common variant" can overwhelm, but witha plant you might have that issue anyway depending on ploidy. So, testfor yourself. Genomes can vary and the tools are so interesting - "sameway" is a gross generalization on my part, in specifics the tools arevery sophisticated.

And, most importantly, as you do have a reference genome to use as aguide (and that is really an invaluable tool not to be ignored) be sureto incorporate it unless it is from a sample that is known to besignificantly, unacceptably, different from the wildtype. It sounds likethe quality has been assessed to be unacceptable to use directly as areference genome for some reason (correct? Or, you just want to build upthe cDNA set -great project!). But the genome can still be utilized.Specifically - using it as an early stage assembly guide will give you ahuge advantage, in my opinion (some assemblers cluster the data first bymapping - you want this if possible). But again, you could try it bothways and check out a few genes to see how the transcript profile workedout (vs any knowns - comparative OK, I always used these when I did thistype of work), plus use the truth metrics (to me) of transcriptionassembly: how many singletons did you end up with (and what do they mapto! can they really be ignored?) & how many over-clustered "genes" didyou get (interesting, sparcer genes gobbled up by abundanthousekeeping). Under-clustered genes/transcripts or incompletetranscripts are other factors, but depending on how you set theparameters in Cufflinks, this may be less important, if it isn't apathological problem.

Many people will have advice about this, so ask, but also test. Lookingat the results will inform you if the path is right. I hope this helps alittle bit!


Jen
Galaxy team


On 11/25/13 1:16 PM, miroslav.sotak wrote:

To whom it may concern
I would like to kindly ask you if you do have any experience inde-novo transcriptomic analysis (no reference genome available) whomight give us some advice.Our main question is how to create the best set of cDNA contigs, onwhich we can map our RNAseq reads for the analysis of differentialexpression. Currently 4 larger sets of of RNAseq reads are availablefrom different genotypes as well as draft genome assembly for one ofthe genotypes. We worry about the SNPs in different genotypesaffecting the assembly, if we combine all the RNAseq datasets andusing assemblers such as Trinity, Oases, Velvet. Might it be better touse the draft genomic assembly to obtain cDNA contigs usingTophat/cufflinks via all available RNAseq data or only using theRNAseq data from the same genotype as the genome draft?
Thank you in advance
Best wishes
Miro Sotak
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Transcriptome Hypericum perforatum

Reply via email to