Hello Pete, For data in the mRna track, the information for cds comes directly from the Genbank data sheet. As you noticed, there are some notational characters in this information sometimes. These are important, especially the arrows, since it means that the CDS is incomplete in the mRNA (extends towards to arrow to an unspecified position). There are descriptions of how to interpret these annotations at NCBI. One place to start: http://www.ncbi.nlm.nih.gov/collab/FT/index.html#2.3
When viewing the details page for a sequence in the mRNA track, this cds data is exactly displayed. For an example, using hg19, search for "EF143990". Then click on the sequence name in the Browser display to find the description page with this listed "CDS: 1..>108". Are you perhaps examining other tracks (from the Gene and Gene Prediction group?) when you are seeing more clearly defined coding regions? If so, then you will need to use the tables associated with those tracks to extract coding regions (CDS). For a genePred table, the table field labels are cdsStart and cdsEnd. These are genomic coordinates, so follow the UCSC coordinate rules: zero-based, half open, with respect to (+) strand. To covert to 1-based, fully closed, add +1 to the start to convert to the actual base covered plus reverse the coordinates for seqs aligned to the (-) strand) for the actual (-) strand coding range(s). http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms Hopefully this helps to make the data more clear. If you have a specific example where the table data differs from the display, we would be glad to take a look. 1) Please note exactly how the track data was found: assembly, track, identifier, position (as a double check), and what cds data you see on the description page. 2) Please note exactly how the table data was found: assembly, track, identifier, position, tables used/linked and how, and what data you find in the cds.name field. We hope this helps, but followup questions are welcomed, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 5/11/10 11:13 AM, Pete Shepard wrote: > Dear Genome Browser Folks, > > I have been trying to extract mrna information from your all_mrna table. I > would like to get the coding start and end information for each mrna, > currently I am doing this using the cds.name field that has the start and > stop of some of the mrna but some numbers have> or< included in this > field. In the browser, this information seems to be available for each gene > but I not using my method. I am wondering if there is a better way of > getting this information? > > TIA > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
