> I find a lot of potential new genes (hundreds or thousands of reads aligning 
> to regions where there is no gene annotation),

This shouldn't be completely unexpected. High-coverage RNA-seq data is 
constantly revealing new exons/splicing/transcripts, even in well-annotated 

> I also find new exons for some genes or exons with different sizes. I was 
> thinking to do an de novo assembly to find new transcripts and genes, but I 
> was wondering if there is something else I could do.

My suggestion: do reference-guided assembly with Cufflinks; this will yield 
both existing and new transcripts.

> For example, maybe I could just extract those regions where thousands of 
> reads align (new gene). I know that we can extract the sequence data for 
> specific transcript, is it possible to extract reads for regions without 
> annotation, only based in the number of reads aligned?

You could subtract known genes from the Cufflinks assembly to get only novel 


The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:


To manage your subscriptions to this and other Galaxy lists,
please use the interface at:


Reply via email to