Dear Sir or Madam,

I am planning to do clustering of several libraries based on the output of 
cuffcompare or cuffdiff, as they allow me to construct a matrix whose columns 
represent the libraries and rows are the count of transcripts or genes.  I want 
to construct the matrix because it is the required input format of many RNA-seq 
clustering softwares, e.g. baySeq, HTSCluster. However, by reading the answer 
of question "I want to find differentially expressed genes. Can I use Cufflinks 
in conjunction with count-based differential expression packages?" in the 
cufflinks FAQ list, it is suggested not to convert FPKM value to count data. 

Now my question is 
1. It seems that it is better to run everything up to cuffdiff, but does 
cuffdiff allow multiple sample comparison because I read somewhere that even 
for multi-samples it still compare tham pairwisely? In a sense, because I want 
to do clustering which needs some quantitative data source to do the merging, 
will cuffdiff provide me some quantitative measures rather than the test score 
and p-value which is too qualitative to include? 
2. If I really need to get count data from the FPKM values, how do I obtain the 
mentioned "effective length"? Would it be better if I treat each assembled 
transcript as an object in clustering, rather than genes. What does it mean 
"you'd be throwing away Cufflinks' uncertainty" even with using isoforms as 
objects? How should I include the uncertainty into my clustering?


The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

Reply via email to