Hi,

I am working with D.virilis SOLiD next-generation sequencing transcriptomics 
data. To begin, I would need to map the data to the D.virilis reference genome 
and count the read tags for annotated genes. I looked at the D.virilis files 
available at:
http://hgdownload.cse.ucsc.edu/goldenPath/droVir2/

I would need the genome sequence in fasta format and the gene information in 
gtf (or gff) format with the coordinates corresponding to the genome fasta 
file. When I looked at the files available at the UCSC download repository, it 
unfortunately isn't completely clear for me which of those files would be the 
best to use, especially since there isn't any documentation that I can find 
clearly explaining the content of the different files under the 'database' 
folder.

For the genome annotation I got an advice from the SOLiD support person, that 
probably the 'scaffoldFa.gz' under the 'bigZips' folder would be the best to 
use as the genome file. 

I was wondering whether the 'xenoRefGene.txt' file would provide a good gene 
annotation? But as there isn't any documentation, how can I be sure that the 
coordinates given in this file for the genes match to the coordinates as 
indexed in the 'scaffoldFa.gz' genome fasta file? This would be quite hard to 
check so it would be really helpful if someone would be able to clarify this 
for me. Still I will of course need to convert this information into gtf (or 
gff) as this does not seem to be readily available. 

Thanks in advance for any advice regarding this matter,
Greetings,
Asta Laiho

--
High-throughput Bioinformatics Group
Turku Centre for Biotechnology
University of Turku, Finland


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to