Re: [Genome] Human (Feb 2009) gene models: gene/isoform naming convention

Jennifer Jackson Mon, 22 Jun 2009 10:53:39 -0700

Hello,

You are correct - we are still creating data for the new Feb 2009 (hg19) 
Human genomic. This includes the UCSC Genes track. We are working to 
release data as quickly as possible. Keep an eye on the hg19 Release Log 
for notification about new data track releases.


Glad to hear that you are able to use the hg18 data for now,
Jennifer Jackson
UCSC Genome Bioinformatics Group

Duff wrote:
> Thank you Jennifer.
> I had hoped to piece together an isoform/gene
> bed-style annotation model from the most recent
> hg19 Feb 2009 annotations. The linked tables
> you mention all appear to be available for the
> previous hg18 March 2006 release only, and
> I will go with that for now, maybe using clusterIDs
> as ersatz genes and the knownIsorforms table
> to map isoforms to "genes."  Thanks again.
>
> -Mike
>
> On Fri, Jun 19, 2009 at 1:53 PM, Jennifer Jackson <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Hi Mike,
>
>     The IDs starting with NM_* are RefSeq IDs. These come directly
>     from genbank. The format is like:
>
>     nucleotide sequences: NM _ XXXXX.NN
>     protein seqences: NP_XXXXXX.NN
>
>     Where the X's are a string of numbers and the N's are a version
>     number. Click through one of these in the Browser to see the
>     Genbank data sheet for these sequences at NCBI. The RefSeq
>     sequences are not exactly clustered by gene from NCBI, although
>     variants are noted by text descriptions here. Many groups
>     (including the UCSC Bioinformatics team) take in this data and do
>     some clustering.
>
>     The track in the UCSC Browser with this information is the UCSC
>     Gene track. It includes sequences from several sources, including
>     the RefSeqs from NCBI, arranged to create a comprehensive,
>     non-redundant, version of the transcriptome/proteome. This will
>     not be as complete as fly (since it is "complete") but it is the
>     best view to date. For this track, the actual nucleotide
>     transcript sequences are given a special unique identifier, but
>     this is mapped to the nucleotide and protein sources (both the
>     actual used and those rolled in when redundancy was removed) and
>     they are grouped into gene bound clusters.
>
>     Open the UCSC Gene track and click on the description page to view
>     how the data was created. Also click on one of the data points to
>     view all of the associated data linked in. Bring up the track in
>     the Table browser to view the tables, schema, linked tables, and
>     content details.
>
>     knownGene - alignment data per transcript
>     knownCanonical - groups transcripts into clusters
>     kgXref - links in all associated IDs
>     kgAlias - another ID linking table (RefSeqs included)
>     refLink, knownToLocusLink - more linked data, including Locus link ID
>     (many other tables linked in)
>
>     Examine the data and please let us know if you need more help,
>     Jennifer Jackson
>     UCSC Genome Bioinformatics Group
>
>     Duff wrote:
>
>         I have been developing informatics scripts used primarily in
>         our analysis of
>         RNAseq data for Drosophila. One of the startingpoints for our
>         analysis is a
>
>         gene model specified by the UCSC Table browser
>         in the form of a .BED file, which lists each isoform name (eg.
>         CG1674-RA,
>         CG1674-RB,...) along with each isoforms' exons' coordinates.
>         The association
>         between isoform and gene is straightforward from the
>         isoformID/name.
>
>         Lately, I've been attempting to adapt the analysis scripts to
>         Humanexpression
>         data, and I'm encountering difficulty in locating, or piecing
>         together, a
>         similar
>         gene model. I'm trying to work with the most up-to-date (Feb 2009)
>         annotations,
>         but the gene/isoform naming convention there seems quite
>         different from that
>         for fly. For example NM_001145277, NM_001145278, and NM_018090
>         appear
>         (judging from txStart & txEnd) to be different isoforms
>         associated with a
>         common
>         gene, though there is nothing within the isoform names
>         themselves to
>         indicate
>         a common gene (and using common txStart/Ends to associate
>         isoforms with
>         common genes would seem, in general, to be incorrect).
>
>         My question is: For Human Feb 2009 annotations, does there
>         exist a table
>         that
>         translates from NM_*  IDs to an ID-scheme similar to that
>         adopted for fly;
>         i.e.,
>         a standard gene name followed by an isoform name sub-tag?
>
>         Any suggestions you might have would be appreciated.
>
>
>
>         -Mike
>
>
>          
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Human (Feb 2009) gene models: gene/isoform naming convention

Reply via email to