> I've started analyzing my RNA-Seq data for two time points: Day0 and Day4 for
> control and treated. I've done aligning the data to the reference genome
> using Tophat. I've removed duplicates from the data sets. Could somebody
> please tell me, how important is it to remove duplicates and how will it
> influence my results if I don't remove?
This depends on whether you are removing duplicates in your fastq data and/or
multi-mapping reads either using Tophat or post-processing steps. In any case,
this approach that will affect quantitation outputs from Cufflinks and likely
transcript assemblies as well.
> I want to start with Cufflinks all the way through to Cuffdiff. Where do I
> start since there are just so many options (in the manual) to choose from?
> What do I look for?
Here's a tutorial that will help you get started with RNA-seq analysis:
Galaxy makes it easy to experiment with different parameter values, so you'll
want to read the Cufflinks/compare/diff manual and adjust parameters that are
relevant to your data:
In general, RNA-seq studies look at (a) transcripts assembled; (b) expression
values; and (c) differential expression estimates.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: