Slight correction on the table contents: knownGene = the alignment of individual transcripts knownIsoforms = groups these transcripts to define a cluster (gene bound knownCanonical = the single transcript from any cluster (knownIsoforms.clusterID) chosen to represent the group
My apologies, I confused knownCanonical with knownIsoforms in my earlier email! Also, see this previous answer for more details: https://lists.soe.ucsc.edu/pipermail/genome/2009-June/019228.html Jennifer Jackson UCSC Genome Bioinformatics Group Jennifer Jackson wrote: > Hi Mike, > > The IDs starting with NM_* are RefSeq IDs. These come directly from > genbank. The format is like: > > nucleotide sequences: NM _ XXXXX.NN > protein seqences: NP_XXXXXX.NN > > Where the X's are a string of numbers and the N's are a version number. > Click through one of these in the Browser to see the Genbank data sheet > for these sequences at NCBI. The RefSeq sequences are not exactly > clustered by gene from NCBI, although variants are noted by text > descriptions here. Many groups (including the UCSC Bioinformatics team) > take in this data and do some clustering. > > The track in the UCSC Browser with this information is the UCSC Gene > track. It includes sequences from several sources, including the RefSeqs > from NCBI, arranged to create a comprehensive, non-redundant, version of > the transcriptome/proteome. This will not be as complete as fly (since > it is "complete") but it is the best view to date. For this track, the > actual nucleotide transcript sequences are given a special unique > identifier, but this is mapped to the nucleotide and protein sources > (both the actual used and those rolled in when redundancy was removed) > and they are grouped into gene bound clusters. > > Open the UCSC Gene track and click on the description page to view how > the data was created. Also click on one of the data points to view all > of the associated data linked in. Bring up the track in the Table > browser to view the tables, schema, linked tables, and content details. > > knownGene - alignment data per transcript > knownCanonical - groups transcripts into clusters > kgXref - links in all associated IDs > kgAlias - another ID linking table (RefSeqs included) > refLink, knownToLocusLink - more linked data, including Locus link ID > (many other tables linked in) > > Examine the data and please let us know if you need more help, > Jennifer Jackson > UCSC Genome Bioinformatics Group > > Duff wrote: > >> I have been developing informatics scripts used primarily in our analysis of >> RNAseq data for Drosophila. One of the startingpoints for our analysis is a >> gene model specified by the UCSC Table browser >> in the form of a .BED file, which lists each isoform name (eg. CG1674-RA, >> CG1674-RB,...) along with each isoforms' exons' coordinates. The association >> between isoform and gene is straightforward from the isoformID/name. >> >> Lately, I've been attempting to adapt the analysis scripts to Humanexpression >> data, and I'm encountering difficulty in locating, or piecing together, a >> similar >> gene model. I'm trying to work with the most up-to-date (Feb 2009) >> annotations, >> but the gene/isoform naming convention there seems quite different from that >> for fly. For example NM_001145277, NM_001145278, and NM_018090 appear >> (judging from txStart & txEnd) to be different isoforms associated with a >> common >> gene, though there is nothing within the isoform names themselves to >> indicate >> a common gene (and using common txStart/Ends to associate isoforms with >> common genes would seem, in general, to be incorrect). >> >> My question is: For Human Feb 2009 annotations, does there exist a table >> that >> translates from NM_* IDs to an ID-scheme similar to that adopted for fly; >> i.e., >> a standard gene name followed by an isoform name sub-tag? >> >> Any suggestions you might have would be appreciated. >> >> >> >> -Mike >> >> >> >> > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
