Hello Stefanie, For more information on how the canonical set of known genes is defined please see this previously answered mailing list question:
https://lists.soe.ucsc.edu/pipermail/genome/2005-July/008123.html Coding transcripts are clustered together if they (1) share some exon(s) within their CDS regions, (2) have the same translation frame in the shared exon(s), and (3) are on the same strand. With these rules, we've identified about 27,000 distinct clusters in hg19. With respect to your examples they wouldn't be clustered together for a variety of reasons: - A shares no exon with B: lines 1,2,3,4,10,11 - A and B are on opposite strands: line 5 - A and B have non-overlapping CDS regions: lines 6,7 - A and B have different translation frames at the shared exons: lines 8,9 Hopefully this information was helpful and answers your question. If you have further questions or require clarification feel free to contact the mailing list at [email protected]. Regards, Pauline Fujita UCSC Genome Bioinformatics Group http://genome.ucsc.edu On 1/12/11 11:39 AM, Stefanie Gerstberger wrote: > Hi, > The ucsc canonical genes are the set of unique nonoverlapping (?) gene > clusters, > but I found that a large number of canonical genes are overlapping. However I > thought that for a given gene cluster, if there are overlapping gene > transcripts, only one selected variant is chosen as the canonical gene? > I have attached some sample output for overlapping genes on chr1. (out of > 2502 > genes 676 were found to be overlapping). > > 51424uc001eso.1chr1+149627308149651107uc009wlc.2chr1+149576452149672983 > 61424uc009wle.1chr1+149577754149582602uc009wlc.2chr1+149576452149672983 > 8220uc010plp.1chr1+169096980169097002uc001gfr.1chr1+169075946169101960 > 91713uc010ocn.1chr1- 17439786 17439808uc001baf.2chr1- 17393256 17445948 > 19945uc001cxl.1chr1+ 55013900 55076000uc001cxn.2chr1- 55074850 55089200 > 2930uc009wlv.1chr1+150521897150524367uc001eux.2chr1+150521897150533410 > 3029uc001eux.2chr1+150521897150533410uc009wlv.1chr1+150521897150524367 > 3536uc009xak.1chr1+203097405203136533uc001gzf.1chr1+203096835203136533 > 3635uc001gzf.1chr1+203096835203136533uc009xak.1chr1+203097405203136533 > 432467uc001byv.2chr1- 35833689 35835012uc001byt.2chr1+ 35734567 35887544 > 5152uc001grp.1chr1-185576315185591914uc001gro.2chr1-185527513185597620 > > How can this be explained, and how many gene clusters can be found in hg19 ? > Thanks a lot, > Stefanie > > > --------------------------------------------------- > Stefanie Gerstberger > graduate student in Chemical Biology > Tri-Institutional Program > Cornell University, > Rockefeller University, > Memorial Sloan Kettering Cancer Center > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
