Hello Andrew, The gene area you are examining is complex.
Technically, UCSC has clustered these transcripts into two distinct genes (as noted by the different cluster IDs). The two clusters do not share exons, which is a requirement of the gene clustering algorithm used by the UCSC Genes processing (see the track description for all the details). Scientifically, the entire transcript set appears to be related, with the annotation noting that the upstream group is protein coding and regulatory (transcription factor) in function and the downstream group is non-coding with vaguely defined oncogene function noted. The two groups are not the same gene (in the classical sense) and they are clearly not paralogs. Given this data, merging these clusters together or using a single representative transcript would probably result in a loss of information. So, why is MYC associated with both? Likely a function of how the gene symbols are brought into the processing for the track. The genes are related. The kgXref table can bring in associations through many sources and sometimes the gene symbols/labels should be interpreted to mean "associated with gene X" rather than "is gene X". It looks like the second, non-coding gene has stronger MYC annotation via RefSeq, but the best advice is to examine all of the evidence yourself (at UCSC and the external sources/literature) to flush out the exact details. Hopefully this helps, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/15/10 2:31 PM, Andrew Yee wrote: > When I was using the knownCanonical table to find the canonical transcript > for MYC, I find that there are two entries. See below. I also included > some fields from hg19.kgXref fields. Is there an accepted method to > determine which one is the most "canonical" transcript? Perhaps use the > transcript where there is a "NM" as the prefix in refseq? > > Thanks, > Andrew > > #hg19.knownCanonical.chrom hg19.knownCanonical.chromStart > hg19.knownCanonical.chromEnd hg19.knownCanonical.clusterId > hg19.knownCanonical.transcript hg19.knownCanonical.protein hg19.kgXref.kgID > hg19.kgXref.mRNA hg19.kgXref.spID hg19.kgXref.spDisplayID > hg19.kgXref.geneSymbol hg19.kgXref.refseq > > chr8 128748314 128753678 24861 uc003ysi.2 uc003ysi.2 > uc003ysi.2 NM_002467 A0N2G3 A0N2G3_HUMAN MYC NM_002467 > chr8 128806778 129113498 24862 uc010mdq.2 uc010mdq.2 > uc010mdq.2 NR_003367 MYC NR_003367 > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
