Hi Trevor, After reviewing this matter further we can confirm that this is indeed a misalignment. This has been caused by the fact that the exon in question is quite small and its sequence is duplicated in the intron that follows it. The same exon in NM_000364 does not have this issue due to two other exons that align between block 3 and the duplication of the block 3 sequence in the intron. Thank you for reporting this - future work on the genbank pipeline will work to better handle difficult cases such as this one.
As you know, there is no one gene prediction track that correctly predicts the alignment of every gene. Hand curated, tracks such as Gencode, can be updated to reflect the current paradigm for the alignment of a particular gene. However, our RefSeq track is automatically generated by BLAT, thus we can not change the alignments on a case by case basis. I am sorry for any inconvenience this causes you. Best, Mary --------------------- Mary Goldman UCSC Bioinformatics Group On 11/16/10 8:44 AM, Pugh, Trevor wrote: > Hi Pauline, > > The point of our original e-mail was to point out that block 3 of NM_001001432 > was mismapped by BLAT and should correspond to the position of block 3 in > NM_000364. The error appears to stem from the incorrect mapping of a single GG > repeat at the beginning of this block that results in the entire exon being > positioned incorrectly. This may also apply to the mapping of GenBank mRNA > X79861. We believe this to be an erroneous BLAT alignment that should be fixed > in the UCSC RefGene track entry for this gene. > > In case the formatting of our original diagram was lost through e-mail, I have > attached a PDF version. > > - Trevor > > -----Original Message----- > From: Pauline Fujita [mailto:[email protected]] > Sent: Monday, November 15, 2010 5:35 PM > To: Pugh, Trevor > Cc: [email protected]; Duffy, Elizabeth > Subject: Re: [Genome] Mismapping of TNNT2 isoforms in RefGene track > > Hello Trevor, > > One of our developers had this to say about your issue: > > NM_001001432 block 3 is at different chromosomal coordinates than > NM_000364 block 3. These blocks are generated by blat alignment. It > appears that NM_001001432 block 3 doesn't annotate a reference genome > exon and I suspect this is a polymorphism with the reference genome. > While it is only support by one GenBank mRNA X79861, the mRNA and EST > tracks show evidence of structural polymorphisms in this genome. > > The block structure of the two alignments in this region, in positive > strand, zero-based coordinates are: > > NM_001001432 > end start size > block 2 201342340 201342396 56 > block 3 201338460 201338470 10 > block 4 201337289 201337355 66 > > NM_000364 > end start size > block 2 201342341 201342396 55 > block 3 201341272 201341283 11 > block 4 201341154 201341169 15 > > > If you have further questions or require clarification feel free to > contact the mailing list at [email protected]. > > Regards, > > Pauline Fujita > UCSC Genome Bioinformatics Group > http://genome.ucsc.edu > > > On 11/11/10 11:41, Pugh, Trevor wrote: > >>> Dear UCSC Curators, >>> >>> We have been examining the mapping of various transcript models of TNNT2 and >>> appear to have uncovered a mapping error of NM_001001432 in the UCSC RefGene >>> Track, hg19 build. The track lists block 3 as mapping to a genomic region >>> lacking splice donor and acceptor sites and to be inconsistent with other >>> transcript models with similar mRNA sequence such as NM_000364. This appears >>> to be due to a mismapping of the 5' GG of block 3 which is correctly mapped >>> > in > >>> NM_000364 but is broken across blocks 2 and 3 in NM_001001432. We have >>> included a diagram of this phenomenon below. Blue denotes exonic sequence, >>> cyan denotes the mismapped GG, and pink denotes the splice recognition >>> locations. >>> >>> We bring this to your attention so that you may evaluate the mapping of the >>> transcript yourself and a correction be made, if warranted. Please let us >>> > know > >>> if you require additional information. >>> >>> Thank you, >>> >>> Trevor Pugh and Beth Duffy >>> Laboratory for Molecular Medicine >>> Partners Healthcare Center for Personalized Genetic Medicine >>> Cambridge, MA >>> >>> Block 2+3 from NM_001001432 (note: no donor/acceptor sites) >>> atcagcaggt ggccttgctg ccatgtgggt gtcactatct cccccagcag 201342556 >>> gggagaaaac aggctttttg ttgcaggtca cacagctcat gaggggtgga 201342506 >>> actagattca ccctaggcct cgctgatctc tgtacaacgg gggccagagc 201342456 >>> tcttctgagg aaggcaggct tccctttgta cctgcactga cttttttctc 201342406 >>> cttttggagG GAGAGCAGAG ACCATGTCTG ACATAGAAGA GGTGGTGGAA 201342356 >>> GAGTACGAGG AGGAGtgagt atctggagca tcttgcctga gtggggtcct 201342306 >>> ctcccgccgc tgccctgaca cctggtccag gagcctccca gctgtccctc 201342256 >>> ... >>> gcaccaagca gggtggccag gtgttggttg gggggtctgg ggacagagtc 201338506 >>> ctctggagag cagccaggga gactggaaat agccaGAGCA GGAAGgacat 201338456 >>> gacgtcagcc ttcagatgcg ccctgctgat ggggagcaca ggaccaaggc 201338406 >>> aagggagtga gaccagggct taattttaga aagtgcgttc tgacagctat 201338356 >>> >>> >>> CDS NM_001001432.1 >>> 1 atgtctgaca tagaagaggt ggtggaagag tacgaggagg aggagcagga agagcaggag >>> 61 gaggcagcgg aagaggatgc tgaagcagag gctgagaccg aggagaccag ggcagaagaa >>> >>> _______________________________________________________________________ >>> >>> Block 2+3 from NM_000364 (note: correct splice sites) >>> cttttggagG GAGAGCAGAG ACCATGTCTG ACATAGAAGA GGTGGTGGAA 201342356 >>> GAGTACGAGG AGGAgtgagt atctggagca tcttgcctga gtggggtcct 201342306 >>> ctcccgccgc tgccctgaca cctggtccag gagcctccca gctgtccctc 201342256 >>> ggattctggg tagaagtagc tgtgtgtgtt ttgggcaccc cgaggagaga 201342206 >>> ... >>> ctagtgggtg tcattgcaag gtgggcaggg cagcgtggac tccactaggc 201341406 >>> aacaagggaa aagaaagggg gattatcttt ggggaaaggc cagtgtgtgc 201341356 >>> atgtgtgtgc aggcgtgtgt gtttgcatgt gcttgtgtgc gagctactga 201341306 >>> cagtgtttcc tgttgctctc agGGAGCAGG AAGgtaagcg taaacgtgtg 201341256 >>> tactcatttg gatcaaagac agcctggttc gaaactgacc cacctcttct 201341206 >>> >>> CDS NM_000364 >>> 1 atgtctgaca tagaagaggt ggtggaagag tacgaggagg aggagcagga agaagcagct >>> 61 gttgaagaag aggaggactg gagagaggac gaagacgagc aggaggaggc agcggaagag >>> >>> >> >> The information in this e-mail is intended only for the person to whom it is >> addressed. If you believe this e-mail was sent to you in error and the e-mail >> contains patient information, please contact the Partners Compliance HelpLine >> > at > >> http://www.partners.org/complianceline . If the e-mail was sent to you in >> > error > >> but does not contain patient information, please contact the sender and >> > properly > >> dispose of the e-mail. >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
