Hi Wilson, The reason you are seeing this difference is because UCSC Genes were constructed using an optimal product from different sources, such as RefSeq. We have created a new table, knownGeneTxMrna, that we now provide along with knownGeneMrna. The difference between the two tables can be found on the UCSC Genes track description page (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene):
*knownGeneMrna* contains the mRNA sequence that represents each UCSC Genes transcript. If the transcript is based on a RefSeq transcript, then this table contains the RefSeq transcript, including any portions that do not align to the genome. *knownGeneTxMrna* contains mRNA sequences for each UCSC Genes transcript. In contrast to the sequencess in knownGeneMrna, these sequences are derived by obtaining the sequences for each exon from the reference genome and concatenating these exonic sequences. I hope this information is useful and answers your question. Please contact us again at [email protected] if you have any further questions. --- Luvina Guruvadoo UCSC Genome Bioinformatics Group On 7/4/2012 8:56 PM, Wilson Lwtan wrote: > Hi, > > I would like to draw your attention to the entries in hg19KnownGeneMrna and > hg19KnownGene. > 1. uc002quk.1 > 2. uc021xvt.1 > 3. uc002qul.2 > > These entries has incompatible transcript length. > > For example, the length of transcript uc002quk.1 given in hg19KnownGeneMrna > is 1632 but the sum of all exon length in hg19KnownGene is 1687. There is a > difference in length of 55. > > Please look into this issue as the 3 entries that I have given to you is > just a tip of a iceberg. In total there are 23695 incompatibilities. > > Thank you. > > Best Regards, > Wilson > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
