Re: [Genome] Human (Feb 2009) gene models: gene/isoform naming convention

Jennifer Jackson Fri, 19 Jun 2009 19:38:48 -0700

Slight correction on the table contents:

knownGene = the alignment of individual transcripts
knownIsoforms = groups these transcripts to define a cluster (gene bound
knownCanonical = the single transcript from any cluster 
(knownIsoforms.clusterID) chosen to represent the group


My apologies, I confused knownCanonical with knownIsoforms in my earlier 
email!

Also, see this previous answer for more details: 
https://lists.soe.ucsc.edu/pipermail/genome/2009-June/019228.html

Jennifer Jackson
UCSC Genome Bioinformatics Group

Jennifer Jackson wrote:
> Hi Mike,
>
> The IDs starting with NM_* are RefSeq IDs. These come directly from 
> genbank. The format is like:
>
> nucleotide sequences: NM _ XXXXX.NN
> protein seqences: NP_XXXXXX.NN
>
> Where the X's are a string of numbers and the N's are a version number. 
> Click through one of these in the Browser to see the Genbank data sheet 
> for these sequences at NCBI. The RefSeq sequences are not exactly 
> clustered by gene from NCBI, although variants are noted by text 
> descriptions here. Many groups (including the UCSC Bioinformatics team) 
> take in this data and do some clustering.
>
> The track in the UCSC Browser with this information is the UCSC Gene 
> track. It includes sequences from several sources, including the RefSeqs 
> from NCBI, arranged to create a comprehensive, non-redundant, version of 
> the transcriptome/proteome. This will not be as complete as fly (since 
> it is "complete") but it is the best view to date. For this track, the 
> actual nucleotide transcript sequences are given a special unique 
> identifier, but this is mapped to the nucleotide and protein sources 
> (both the actual used and those rolled in when redundancy was removed) 
> and they are grouped into gene bound clusters.
>
> Open the UCSC Gene track and click on the description page to view how 
> the data was created. Also click on one of the data points to view all 
> of the associated data linked in. Bring up the track in the Table 
> browser to view the tables, schema, linked tables, and content details.
>
> knownGene - alignment data per transcript
> knownCanonical - groups transcripts into clusters
> kgXref - links in all associated IDs
> kgAlias - another ID linking table (RefSeqs included)
> refLink, knownToLocusLink - more linked data, including Locus link ID
> (many other tables linked in)
>
> Examine the data and please let us know if you need more help,
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
> Duff wrote:
>   
>> I have been developing informatics scripts used primarily in our analysis of
>> RNAseq data for Drosophila. One of the startingpoints for our analysis is a
>> gene model specified by the UCSC Table browser
>> in the form of a .BED file, which lists each isoform name (eg. CG1674-RA,
>> CG1674-RB,...) along with each isoforms' exons' coordinates. The association
>> between isoform and gene is straightforward from the isoformID/name.
>>
>> Lately, I've been attempting to adapt the analysis scripts to Humanexpression
>> data, and I'm encountering difficulty in locating, or piecing together, a
>> similar
>> gene model. I'm trying to work with the most up-to-date (Feb 2009)
>> annotations,
>> but the gene/isoform naming convention there seems quite different from that
>> for fly. For example NM_001145277, NM_001145278, and NM_018090 appear
>> (judging from txStart & txEnd) to be different isoforms associated with a
>> common
>> gene, though there is nothing within the isoform names themselves to
>> indicate
>> a common gene (and using common txStart/Ends to associate isoforms with
>> common genes would seem, in general, to be incorrect).
>>
>> My question is: For Human Feb 2009 annotations, does there exist a table
>> that
>> translates from NM_*  IDs to an ID-scheme similar to that adopted for fly;
>> i.e.,
>> a standard gene name followed by an isoform name sub-tag?
>>
>> Any suggestions you might have would be appreciated.
>>
>>
>>
>> -Mike
>>
>>
>>   
>>     
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Human (Feb 2009) gene models: gene/isoform naming convention

Reply via email to