Thank you Jennifer.
I had hoped to piece together an isoform/gene
bed-style annotation model from the most recent
hg19 Feb 2009 annotations. The linked tables
you mention all appear to be available for the
previous hg18 March 2006 release only, and
I will go with that for now, maybe using clusterIDsas ersatz genes and the
knownIsorforms table
to map isoforms to "genes."  Thanks again.

-Mike

On Fri, Jun 19, 2009 at 1:53 PM, Jennifer Jackson <[email protected]> wrote:

> Hi Mike,
>
> The IDs starting with NM_* are RefSeq IDs. These come directly from
> genbank. The format is like:
>
> nucleotide sequences: NM _ XXXXX.NN
> protein seqences: NP_XXXXXX.NN
>
> Where the X's are a string of numbers and the N's are a version number.
> Click through one of these in the Browser to see the Genbank data sheet for
> these sequences at NCBI. The RefSeq sequences are not exactly clustered by
> gene from NCBI, although variants are noted by text descriptions here. Many
> groups (including the UCSC Bioinformatics team) take in this data and do
> some clustering.
>
> The track in the UCSC Browser with this information is the UCSC Gene track.
> It includes sequences from several sources, including the RefSeqs from NCBI,
> arranged to create a comprehensive, non-redundant, version of the
> transcriptome/proteome. This will not be as complete as fly (since it is
> "complete") but it is the best view to date. For this track, the actual
> nucleotide transcript sequences are given a special unique identifier, but
> this is mapped to the nucleotide and protein sources (both the actual used
> and those rolled in when redundancy was removed) and they are grouped into
> gene bound clusters.
>
> Open the UCSC Gene track and click on the description page to view how the
> data was created. Also click on one of the data points to view all of the
> associated data linked in. Bring up the track in the Table browser to view
> the tables, schema, linked tables, and content details.
>
> knownGene - alignment data per transcript
> knownCanonical - groups transcripts into clusters
> kgXref - links in all associated IDs
> kgAlias - another ID linking table (RefSeqs included)
> refLink, knownToLocusLink - more linked data, including Locus link ID
> (many other tables linked in)
>
> Examine the data and please let us know if you need more help,
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
> Duff wrote:
>
>> I have been developing informatics scripts used primarily in our analysis
>> of
>> RNAseq data for Drosophila. One of the startingpoints for our analysis is
>> a
>> gene model specified by the UCSC Table browser
>> in the form of a .BED file, which lists each isoform name (eg. CG1674-RA,
>> CG1674-RB,...) along with each isoforms' exons' coordinates. The
>> association
>> between isoform and gene is straightforward from the isoformID/name.
>>
>> Lately, I've been attempting to adapt the analysis scripts to
>> Humanexpression
>> data, and I'm encountering difficulty in locating, or piecing together, a
>> similar
>> gene model. I'm trying to work with the most up-to-date (Feb 2009)
>> annotations,
>> but the gene/isoform naming convention there seems quite different from
>> that
>> for fly. For example NM_001145277, NM_001145278, and NM_018090 appear
>> (judging from txStart & txEnd) to be different isoforms associated with a
>> common
>> gene, though there is nothing within the isoform names themselves to
>> indicate
>> a common gene (and using common txStart/Ends to associate isoforms with
>> common genes would seem, in general, to be incorrect).
>>
>> My question is: For Human Feb 2009 annotations, does there exist a table
>> that
>> translates from NM_*  IDs to an ID-scheme similar to that adopted for fly;
>> i.e.,
>> a standard gene name followed by an isoform name sub-tag?
>>
>> Any suggestions you might have would be appreciated.
>>
>>
>>
>> -Mike
>>
>>
>>
>>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to