Hi Stefanie, The best way to get information is to look at the track description pages for the tracks you are interested in. Here are links to a few description pages to help you get started:
C.Elegans Worm Base Genes: http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chrII&g=sangerGene In this case the data was provided from the Sanger Institute FTP site. Please contact them regarding any questions you have about the data. Drosophila Fly Base Genes: http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr2L&g=flyBaseGene The data for this track were downloaded from FlyBase. Please contact them regarding any questions you have about the data. In regards to how the UTR's are calculated on the human Known Genes track (http://genome.ucsc.edu/cgi-bin/hgTrackUi?c=chr21&g=knownGene): "For non-RefSeq transcripts we use the txCdsPredict program to determine if the transcript is protein-coding and if so, the locations of the start and stop codons. The program weighs as positive evidence the length of the protein, the presence of a Kozak consensus sequence at the start codon, and the length of the orthologous predicted protein in other species. As negative evidence it considers nonsense-mediated decay and start codons in any frame upstream of the predicted start codon. For RefSeq transcripts the RefSeq protein prediction is used directly instead of this procedure. For CCDS proteins the CCDS protein is used directly." So if the transcripts were obtained from RefSeq the txCdsPredict program isn't used. The txCdsPredict program is used for non-RefSeq transcripts (ie. GenBank). Also, please feel free to search our mailing list archives to see if other users have asked similar questions. This can be done from the home page by clicking on "Contact Us" and then in the input box next to "Search the Genome mailing list archives:" enter in the search terms you are interested in. Hopes this clarifies things for you. Vanessa Kirkup Swing UCSC Genome Bioinformatics Group ----- Original Message ----- From: "Stefanie Gerstberger" <[email protected]> To: "UCSC" <[email protected]> Sent: Tuesday, January 25, 2011 11:12:31 AM Subject: [Genome] lengths of UTRs > >Hi, > > >I was wondering in particular how the lengths of untranscribed regions in the >UCSC genome browser are defined. In particular with respect to recent > developments in the determination of 3'ends in several genomes such as >C.elegans and Drosophila, I was wondering how the annotations of untranslated >regions in UCSC are made. > > > I was in particular wondering about the human knownGenes and in specific how >the utrs in the knownGene.txt are defined. As far as I read those are either >taken directly from refseq or, if there were not present in refseq the protein >coding sequence was calculated using txCdsPredict. So am I correct about that >the length of the transcribed regions is not calculated by UCSC but comes >straight from either refseq or genebank? >Thanks again, >Stefanie _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
