> 1. It seems that it is better to run everything up to cuffdiff, but does 
> cuffdiff allow multiple sample comparison because I read somewhere that even 
> for multi-samples it still compare tham pairwisely?

Cuffdiff supports replicate analysis.

> In a sense, because I want to do clustering which needs some quantitative 
> data source to do the merging, will cuffdiff provide me some quantitative 
> measures rather than the test score and p-value which is too qualitative to 
> include? 

Take a look at the Cuffdiff documentation for outputs: 

> 2. If I really need to get count data from the FPKM values, how do I obtain 
> the mentioned "effective length"? Would it be better if I treat each 
> assembled transcript as an object in clustering, rather than genes. What does 
> it mean "you'd be throwing away Cufflinks' uncertainty" even with using 
> isoforms as objects? How should I include the uncertainty into my clustering?

These FAQs from http://cufflinks.cbcb.umd.edu/faq.html address your questions:

I want to find differentially expressed genes. Can I use Cufflinks in 
conjunction with count-based differential expression packages?

It's possible, but we strongly advise against this. Current count-based 
differential expression tools are poorly suited to differential expression 
analysis in genomes with alternatively spliced genes. The main reason for this 
is that when a gene has multiple isoforms, a change in the total number of 
reads or fragments from that gene doesn't always correspond to a change in 
expression for that gene. Conversely, a gene's expression may change, but the 
total number of fragments generated by its isoforms may be very similar. In 
order to detect changes accurately, it's necessary to estimate how many 
fragments came from each individual splice variant in each sample. Current 
count-based tools don't do this (to our knowledge - please send us email if you 
know of one!). Even if they did, fragments that come from parts of genes that 
are shared by more than one splice variant can't generally assigned to a single 
isoform, so the fragment counts for each isoform are only estimates, and there 
is some uncertainty in the counts. Isoforms that are very similar will have a 
great deal of uncertainty surrounding their fragment counts. This uncertainty 
needs to be accounted for when testing for differential expression. So while 
you could use Cufflinks to estimate isoform-level counts, you'd be throwing 
away Cufflinks' uncertainty, and thus have more confidence in the differences 
you see than you really should. This will probably lead to many false positives 
in your analysis. Furthermore, we do not normalize simply by the length to 
calculate FPKM but an effective length, as explained in our publications. 
Calculting counts from FPKM by multiplying by the length will give incorrect 
results. We strongly encourage you to consider using Cuffdiff to find 
differentially expressed genes and transcripts.

Will you please report how many fragments come from each transcript in a future 

For the foreseeable future, we will not be reporting the number of fragments we 
think originated from each transcript. People who have asked for this almost 
always want to use Cufflinks in conjunction with count-based differential 
expression packages, which is not a good idea. We're trying to keep our output 
formats as simple as possible.


The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:


To manage your subscriptions to this and other Galaxy lists,
please use the interface at:


Reply via email to