Hello,
Interesting genome. I see that SRA has some RNA-seq public data, but
there isn't much else going on. And you goal is to characterize the
expression for observed phenotypes (linked to known genotypes)? If you
use the Tuxedo suite after assembly (Trinity or other), differential
expression of alternative splicing is one of the discovery outputs.
From my experience (and other are welcome to add comments), most SNP
differences (_single_ base polymorphisms) do not in general impact the
global assembly of whole genome data. Larger insertions/deletions are
where you will observe differences. But that is DNA.
For transcription assembly, including RNA-seq, novel isoforms per sample
and in particular rare events like SNPs, can become diluted when
multiple samples are directly combined and assembled together straight
de-novo. Still, obtaining full length cDNAs is certainly possible. And
it has been done just about the same way, with various types of RNA
data, for a very long time (most of RefSeq started out that way). The
downside here is that "the most common variant" can overwhelm, but with
a plant you might have that issue anyway depending on ploidy. So, test
for yourself. Genomes can vary and the tools are so interesting - "same
way" is a gross generalization on my part, in specifics the tools are
very sophisticated.
And, most importantly, as you do have a reference genome to use as a
guide (and that is really an invaluable tool not to be ignored) be sure
to incorporate it unless it is from a sample that is known to be
significantly, unacceptably, different from the wildtype. It sounds like
the quality has been assessed to be unacceptable to use directly as a
reference genome for some reason (correct? Or, you just want to build up
the cDNA set -great project!). But the genome can still be utilized.
Specifically - using it as an early stage assembly guide will give you a
huge advantage, in my opinion (some assemblers cluster the data first by
mapping - you want this if possible). But again, you could try it both
ways and check out a few genes to see how the transcript profile worked
out (vs any knowns - comparative OK, I always used these when I did this
type of work), plus use the truth metrics (to me) of transcription
assembly: how many singletons did you end up with (and what do they map
to! can they really be ignored?) & how many over-clustered "genes" did
you get (interesting, sparcer genes gobbled up by abundant
housekeeping). Under-clustered genes/transcripts or incomplete
transcripts are other factors, but depending on how you set the
parameters in Cufflinks, this may be less important, if it isn't a
pathological problem.
Many people will have advice about this, so ask, but also test. Looking
at the results will inform you if the path is right. I hope this helps a
little bit!
Jen
Galaxy team
On 11/25/13 1:16 PM, miroslav.sotak wrote:
To whom it may concern
I would like to kindly ask you if you do have any experience in
de-novo transcriptomic analysis (no reference genome available) who
might give us some advice.
Our main question is how to create the best set of cDNA contigs, on
which we can map our RNAseq reads for the analysis of differential
expression. Currently 4 larger sets of of RNAseq reads are available
from different genotypes as well as draft genome assembly for one of
the genotypes. We worry about the SNPs in different genotypes
affecting the assembly, if we combine all the RNAseq datasets and
using assemblers such as Trinity, Oases, Velvet. Might it be better to
use the draft genomic assembly to obtain cDNA contigs using
Tophat/cufflinks via all available RNAseq data or only using the
RNAseq data from the same genotype as the genome draft?
Thank you in advance
Best wishes
Miro Sotak
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/