Hi Stefanie,

The best way to get information is to look at the track description pages for 
the tracks you are interested in. Here are links to a few description pages to 
help you get started:

C.Elegans 

Worm Base Genes: http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chrII&g=sangerGene

In this case the data was provided from the Sanger Institute FTP site. Please 
contact them regarding any questions you have about the data.


Drosophila

Fly Base Genes: http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr2L&g=flyBaseGene

The data for this track were downloaded from FlyBase. Please contact them 
regarding any questions you have about the data.


In regards to how the UTR's are calculated on the human Known Genes track 
(http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr21&g=knownGene):

"For non-RefSeq transcripts we use the txCdsPredict program to determine if the 
transcript is protein-coding and if so, the locations of the start and stop 
codons. The program weighs as positive evidence the length of the protein, the 
presence of a Kozak consensus sequence at the start codon, and the length of 
the orthologous predicted protein in other species. As negative evidence it 
considers nonsense-mediated decay and start codons in any frame upstream of the 
predicted start codon. For RefSeq transcripts the RefSeq protein prediction is 
used directly instead of this procedure. For CCDS proteins the CCDS protein is 
used directly."

So if the transcripts were obtained from RefSeq the txCdsPredict program isn't 
used. The txCdsPredict program is used for non-RefSeq transcripts (ie. GenBank).


Also, please feel free to search our mailing list archives to see if other 
users have asked similar questions. This can be done from the home page by 
clicking on "Contact Us" and then in the input box next to "Search the Genome 
mailing list archives:" enter in the search terms you are interested in.

Hopes this clarifies things for you.

Vanessa Kirkup Swing
UCSC Genome Bioinformatics Group




----- Original Message -----
From: "Stefanie Gerstberger" <[email protected]>
To: "UCSC" <[email protected]>
Sent: Tuesday, January 25, 2011 11:12:31 AM
Subject: [Genome] lengths of UTRs


>
>Hi,
>
>
>I was wondering in particular how the lengths of untranscribed regions in the 
>UCSC genome browser are defined. In particular with respect to recent 
> developments in the determination of 3'ends in several genomes such as 
>C.elegans and Drosophila, I was wondering how the annotations of untranslated 
>regions in UCSC are made.
>
>
> I was in particular wondering about the human knownGenes and in specific how 
>the utrs in the knownGene.txt are defined. As far as I read those are either 
>taken directly from refseq or, if there were not present in refseq the protein 
>coding sequence was calculated using  txCdsPredict. So am I correct about that 
>the length of the transcribed regions is not calculated by UCSC but comes 
>straight from either refseq or genebank?
>Thanks again,
>Stefanie

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to