Re: [Genome] D.virilis genome & gene annotation files for ngs mapping

Hiram Clawson Wed, 23 Mar 2011 12:03:16 -0700

Good Morning Asta:

Please note the three different versions of D. virilis sequences available:


http://hgdownload.cse.ucsc.edu/downloads.html#droVir

The sequence for droVir2 is indeed in the file: scaffoldsFa.gz as
mentioned in the README file:

http://hgdownload.cse.ucsc.edu/goldenPath/droVir2/bigZips/README.txt

scaffoldFa.gz - The working draft sequence in one FASTA record per scaffold.
    Repeats from RepeatMasker and Tandem Repeats Finder (with period
    of 12 or less) are in lower case while non-repeating sequence is
    in upper case.

For potential gene tracks, take a look at the genome browser display of
these genomes:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=droVir2

and note the different types of gene tracks.  Any file in the database
dump directory will be for that genome assembly.  If you need information
about the structure of a file, look at the corresponding .sql file.
Some of the formats will correspond to our standard file formats:
http://genome.ucsc.edu/FAQ/FAQformat.html

There are many file format converter programs in the kent source tree:
http://genome.ucsc.edu/admin/git.html
http://genome.ucsc.edu/admin/jk-install.html

--Hiram

----- Original Message -----
From: "Asta Laiho" <[email protected]>
To: [email protected]
Sent: Wednesday, March 23, 2011 6:37:52 AM
Subject: [Genome] D.virilis genome & gene annotation files for ngs mapping

Hi,

I am working with D.virilis SOLiD next-generation sequencing transcriptomics 
data. To begin, I would need to map the data to the D.virilis reference genome 
and count the read tags for annotated genes. I looked at the D.virilis files 
available at:
http://hgdownload.cse.ucsc.edu/goldenPath/droVir2/

I would need the genome sequence in fasta format and the gene information in 
gtf (or gff) format with the coordinates corresponding to the genome fasta 
file. When I looked at the files available at the UCSC download repository, it 
unfortunately isn't completely clear for me which of those files would be the 
best to use, especially since there isn't any documentation that I can find 
clearly explaining the content of the different files under the 'database' 
folder.

For the genome annotation I got an advice from the SOLiD support person, that 
probably the 'scaffoldFa.gz' under the 'bigZips' folder would be the best to 
use as the genome file. 

I was wondering whether the 'xenoRefGene.txt' file would provide a good gene 
annotation? But as there isn't any documentation, how can I be sure that the 
coordinates given in this file for the genes match to the coordinates as 
indexed in the 'scaffoldFa.gz' genome fasta file? This would be quite hard to 
check so it would be really helpful if someone would be able to clarify this 
for me. Still I will of course need to convert this information into gtf (or 
gff) as this does not seem to be readily available. 

Thanks in advance for any advice regarding this matter,
Greetings,
Asta Laiho

--
High-throughput Bioinformatics Group
Turku Centre for Biotechnology
University of Turku, Finland


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] D.virilis genome & gene annotation files for ngs mapping

Reply via email to