Hi Schragi, I asked one of our engineers about this case. Here is what she said:
---- A subtlety of the UCSC Genes pipeline is that it doesn't put two genes in the same cluster if they have different translation frames. In this case, these isoforms have different translation frames. The easiest way to see it is in the protein translations, which are totally different. So that's why these isoforms don't cluster together, even though they seem to overlap each other perfectly. ---- If you have further questions, please feel free to contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 01/17/11 01:08, Schragi Schwartz wrote: > Hi, > I was under the impression that the canonical genes set was a dataset of > non-redundant genes, with different isoforms clustered as a single, > representative transcript. However, I now came across the two ids > "uc002hra.1" and "uc010cvq.1", which are obviously two splice isoforms, and > yet they are both independently represented in the canonical gene dataset. > I would be very grateful if you could clarify what's going on. > Best, > Schragi > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
