Hello,

I have downloaded refGene table form the RefSeq Genes track (hg18) and  
found the following problem: For hundreds of protein-coding  
transcripts, the length of the coding region is not a whole  
multiplication of triplets.

For one example I checked transcript NM_000804. According to NCBI  
nucleotide DB record for this transcript, the coding length is 738  
(which is fine: 738=246*3); but calculating coding length region  
according to the coordinates provided in refGene, the length is 736.

To understand where the difference comes from, I compared exons’  
lengths and found that the problem is in exon3: there is a difference  
of 2 nucleotides in that exon – see below.

Tx=NM_000804, (chr11)

NCBI nucleotide DB info
http://www.ncbi.nlm.nih.gov/nuccore/9257219
=============================================
      exon1            1..44    Len=44
      exon2            45..218  Len=174
      exon3            219..407 Len=189
      exon4            408..543 Len=136
      exon5            544..847 Len=304

      CDS             51..788   Len=738
      polyA_site      847


RefSeq table downloaded from UCSC
=======================================
exon1 len=44,  exS=71524418, exE=71524462
exon2 len=174, exS=71524640, exE=71524814
exon3 len=187  exS=71527654, exE=71527841 <----- (len is 187 instead of 189)
exon4 len=136  exS=71528038, exE=71528174
exon5 len=304, exS=71528278, exE=71528582

5utrL=50, cdsL=736, 3utrL=59, mRNA_L=845
------------------------------------------------------

•       Could you please check why for many protein-coding transcripts, the  
length of the coding region is not a whole multiplication of triplets.

•       Another problem that I encountered when calculating exons’ lengths  
was that in order to get the correct length (according to NCBI  
nucleotide DB), one has to calculate (exonEnd – exonS) rather than  
what I expected: (exonEnd – exonS +1). It seems that exonS positions  
(but not exonsEnd ones) are (-1) shifted. Is this indeed the case?

Many thanks in advance,
Rani



_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to