Hi Stefanie, There actually are two different genes in the Gene Sorter still, but only one representative gene for the region is displayed by default. To see RBM4, hit the "configure" button in the gene sorter, and check the box next to "Show all splicing variants".
When the UCSC Genes track is made, splicing variants are clustered, and two tables are created to keep track of the clusters: knownIsoforms and knownCanonical. Every UCSC Gene is listed in the knownIsoforms table and is assigned to a cluster. For each cluster, a single representative gene is selected and put into the knownCanonical table. The one that is in the knownCanonical table is displayed by default in the Gene Sorter. Here is the list of genes that are in this particular cluster (for the GRCh37/hg19 assembly): > #hg19.knownGene.name hg19.kgXref.geneSymbol hg19.knownIsoforms.clusterId > uc009yrh.2 RBM14 4457 > uc001oit.2 RBM14 4457 > uc009yri.2 RBM14 4457 > uc009yrj.2 RBM4 4457 > uc009yrk.2 RBM14/RBM4 fusion 4457 > uc001oiv.2 RBM4 4457 > uc001oiw.1 RBM4 4457 > uc001oix.1 RBM4 4457 > uc010rpj.1 RBM4 4457 > uc001oiy.1 RBM4 4457 > uc001oiz.1 RBM4 4457 One of our engineers had this to say about this cluster of genes: --- It makes perfect sense that RBM14 and RBM4 are clustered together. They're located close together on chr11 and on the same strand, and there's loads of evidence of transcripts that fuse the two. These fusion products have evidence up to and including validated RefSeq transcripts and Swissprot proteins. So while NCBI and HGNC say that they're two different loci, maybe they are and maybe they're not. It's also worth noting that both RBM14 and RBM4 are involved in RNA processing, and transcripts involved in RNA processing are often themselves processed in interesting and different ways. They could be one locus with alternative starts and ends, or could be two loci with read-through co-expression, and transcripts periodically spliced together. If they are two totally separate loci (which seems unlikely, IMHO), they're a great example of why it's so hard to determine gene boundaries in an automated fashion. --- I hope that explains what you are seeing. One caveat: we are seeing some odd sorting with "show all splice variants" turned on when either sorting by Gene Distance or Expression is selected. We are looking into what could be the cause. If you have further questions, please feel free to contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 01/02/11 10:53, Stefanie Gerstberger wrote: > Hi, > I noticed that on the ucsc gene sorter the gene RBM4 is known as RBM14. > However > on ncbi and hgnc those are 2 different genes. Why is that, what does this > mean > and how should I consider these genes? > Thanks a lot, > Stefanie > > > > --------------------------------------------------- > Stefanie Gerstberger > graduate student in Chemical Biology > Tri-Institutional Program > Cornell University, > Rockefeller University, > Memorial Sloan Kettering Cancer Center > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
