Hi Trevor,

After reviewing this matter further we can confirm that this is indeed a 
misalignment. This has been caused by the fact that the exon in question 
is quite small and its sequence is duplicated in the intron that follows 
it. The same exon in NM_000364 does not have this issue due to two other 
exons that align between block 3 and the duplication of the block 3 
sequence in the intron. Thank you for reporting this - future work on 
the genbank pipeline will work to better handle difficult cases such as 
this one.

As you know, there is no one gene prediction track that correctly 
predicts the alignment of every gene. Hand curated, tracks such as 
Gencode, can be updated to reflect the current paradigm for the 
alignment of a particular gene. However, our RefSeq track is 
automatically generated by BLAT, thus we can not change the alignments 
on a case by case basis. I am sorry for any inconvenience this causes you.

Best,
Mary
---------------------
Mary Goldman
UCSC Bioinformatics Group

On 11/16/10 8:44 AM, Pugh, Trevor wrote:
> Hi Pauline,
>
> The point of our original e-mail was to point out that block 3 of NM_001001432
> was mismapped by BLAT and should correspond to the position of block 3 in
> NM_000364. The error appears to stem from the incorrect mapping of a single GG
> repeat at the beginning of this block that results in the entire exon being
> positioned incorrectly. This may also apply to the mapping of GenBank mRNA
> X79861. We believe this to be an erroneous BLAT alignment that should be fixed
> in the UCSC RefGene track entry for this gene.
>
> In case the formatting of our original diagram was lost through e-mail, I have
> attached a PDF version.
>
> - Trevor
>
> -----Original Message-----
> From: Pauline Fujita [mailto:[email protected]]
> Sent: Monday, November 15, 2010 5:35 PM
> To: Pugh, Trevor
> Cc: [email protected]; Duffy, Elizabeth
> Subject: Re: [Genome] Mismapping of TNNT2 isoforms in RefGene track
>
> Hello Trevor,
>
> One of our developers had this to say about your issue:
>
> NM_001001432 block 3 is at different chromosomal coordinates than
> NM_000364 block 3. These blocks are generated by blat alignment. It
> appears that NM_001001432 block 3 doesn't annotate a reference genome
> exon and I suspect this is a polymorphism with the reference genome.
> While it is only support by one GenBank mRNA X79861, the mRNA and EST
> tracks show evidence of structural polymorphisms in this genome.
>
> The block structure of the two alignments in this region, in positive
> strand, zero-based coordinates are:
>
> NM_001001432
>             end         start      size
> block 2   201342340   201342396   56
> block 3   201338460   201338470   10
> block 4   201337289   201337355   66
>
> NM_000364
>             end         start      size
> block 2   201342341   201342396   55
> block 3   201341272   201341283   11
> block 4   201341154   201341169   15
>
>
> If you have further questions or require clarification feel free to
> contact the mailing list at [email protected].
>
> Regards,
>
> Pauline Fujita
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> On 11/11/10 11:41, Pugh, Trevor wrote:
>    
>>> Dear UCSC Curators,
>>>
>>> We have been examining the mapping of various transcript models of TNNT2 and
>>> appear to have uncovered a mapping error of NM_001001432 in the UCSC RefGene
>>> Track, hg19 build. The track lists block 3 as mapping to a genomic region
>>> lacking splice donor and acceptor sites and to be inconsistent with other
>>> transcript models with similar mRNA sequence such as NM_000364. This appears
>>> to be due to a mismapping of the 5' GG of block 3 which is correctly mapped
>>>        
> in
>    
>>> NM_000364 but is broken across blocks 2 and 3 in NM_001001432. We have
>>> included a diagram of this phenomenon below. Blue denotes exonic sequence,
>>> cyan denotes the mismapped GG, and pink denotes the splice recognition
>>> locations.
>>>
>>> We bring this to your attention so that you may evaluate the mapping of the
>>> transcript yourself and a correction be made, if warranted. Please let us
>>>        
> know
>    
>>> if you require additional information.
>>>
>>> Thank you,
>>>
>>> Trevor Pugh and Beth Duffy
>>> Laboratory for Molecular Medicine
>>> Partners Healthcare Center for Personalized Genetic Medicine
>>> Cambridge, MA
>>>
>>> Block 2+3 from NM_001001432 (note: no donor/acceptor sites)
>>> atcagcaggt ggccttgctg ccatgtgggt gtcactatct cccccagcag  201342556
>>> gggagaaaac aggctttttg ttgcaggtca cacagctcat gaggggtgga  201342506
>>> actagattca ccctaggcct cgctgatctc tgtacaacgg gggccagagc  201342456
>>> tcttctgagg aaggcaggct tccctttgta cctgcactga cttttttctc  201342406
>>> cttttggagG GAGAGCAGAG ACCATGTCTG ACATAGAAGA GGTGGTGGAA  201342356
>>> GAGTACGAGG AGGAGtgagt atctggagca tcttgcctga gtggggtcct  201342306
>>> ctcccgccgc tgccctgaca cctggtccag gagcctccca gctgtccctc  201342256
>>> ...
>>> gcaccaagca gggtggccag gtgttggttg gggggtctgg ggacagagtc  201338506
>>> ctctggagag cagccaggga gactggaaat agccaGAGCA GGAAGgacat  201338456
>>> gacgtcagcc ttcagatgcg ccctgctgat ggggagcaca ggaccaaggc  201338406
>>> aagggagtga gaccagggct taattttaga aagtgcgttc tgacagctat  201338356
>>>
>>>
>>> CDS NM_001001432.1
>>>   1 atgtctgaca tagaagaggt ggtggaagag tacgaggagg aggagcagga agagcaggag
>>> 61 gaggcagcgg aagaggatgc tgaagcagag gctgagaccg aggagaccag ggcagaagaa
>>>
>>> _______________________________________________________________________
>>>
>>> Block 2+3 from NM_000364 (note: correct splice sites)
>>> cttttggagG GAGAGCAGAG ACCATGTCTG ACATAGAAGA GGTGGTGGAA  201342356
>>> GAGTACGAGG AGGAgtgagt atctggagca tcttgcctga gtggggtcct  201342306
>>> ctcccgccgc tgccctgaca cctggtccag gagcctccca gctgtccctc  201342256
>>> ggattctggg tagaagtagc tgtgtgtgtt ttgggcaccc cgaggagaga  201342206
>>> ...
>>> ctagtgggtg tcattgcaag gtgggcaggg cagcgtggac tccactaggc  201341406
>>> aacaagggaa aagaaagggg gattatcttt ggggaaaggc cagtgtgtgc  201341356
>>> atgtgtgtgc aggcgtgtgt gtttgcatgt gcttgtgtgc gagctactga  201341306
>>> cagtgtttcc tgttgctctc agGGAGCAGG AAGgtaagcg taaacgtgtg  201341256
>>> tactcatttg gatcaaagac agcctggttc gaaactgacc cacctcttct  201341206
>>>
>>> CDS NM_000364
>>>   1 atgtctgaca tagaagaggt ggtggaagag tacgaggagg aggagcagga agaagcagct
>>> 61 gttgaagaag aggaggactg gagagaggac gaagacgagc aggaggaggc agcggaagag
>>>
>>>        
>>
>> The information in this e-mail is intended only for the person to whom it is
>> addressed. If you believe this e-mail was sent to you in error and the e-mail
>> contains patient information, please contact the Partners Compliance HelpLine
>>      
> at
>    
>> http://www.partners.org/complianceline . If the e-mail was sent to you in
>>      
> error
>    
>> but does not contain patient information, please contact the sender and
>>      
> properly
>    
>> dispose of the e-mail.
>>
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>      
>    
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>    
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to