Hi Everyone,
    I am working with *Aedes aegypti * and I obtained around 500 million
reads (HiSeq2000, 50bp). After doing all analysis of differential gene
expression using known packages (Tophat, Cufflinks, Deseq etc) I was able
to find a set of gene of interest, besides some functional group of genes
that I already knew that I had to look at. Now, just looking over the 4,758
supercontigs and my data using IGV from Broad Institute (loading the genome
and the SAM files from Tophat), I find a lot of potential new genes
(hundreds or thousands of reads aligning to regions where there is no gene
annotation), I also find new exons for some genes or exons with different
sizes. I was thinking to do an *de novo* assembly to find new transcripts
and genes, but I was wondering if there is something else I could do. For
example, maybe I could just extract those regions where thousands of reads
align (new gene). I know that we can extract the sequence data for specific
transcript, is it possible to extract reads for regions without annotation,
only based in the number of reads aligned? Maybe I could pull all the data
together (from a couple sequencing lanes) and align it back to the genome,
and then proceed to gene annotation. Another problem is that I am not sure
how reliable would be the annotation only based on the data from HiSeq2000.
I would appreciate if anyone one have some idea or suggestion in how to
tackle this problem. Maybe *de novo* assembly is the way to go.

Thank you.

*Luciano Cosme*

PhD Candidate
Texas A&M Entomology
Vector Biology Research Group
979 845 1885
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:


To manage your subscriptions to this and other Galaxy lists,
please use the interface at:


Reply via email to