Hello Rani, We are currently looking into this and will contact you shortly. Thank you for your patience.
Best regards, Pauline Fujita UCSC Genome Bioinformatics Group http://genome.ucsc.edu On 09/29/10 03:19, [email protected] wrote: > Hello, > > I have downloaded refGene table form the RefSeq Genes track (hg18) and > found the following problem: For hundreds of protein-coding > transcripts, the length of the coding region is not a whole > multiplication of triplets. > > For one example I checked transcript NM_000804. According to NCBI > nucleotide DB record for this transcript, the coding length is 738 > (which is fine: 738=246*3); but calculating coding length region > according to the coordinates provided in refGene, the length is 736. > > To understand where the difference comes from, I compared exons’ > lengths and found that the problem is in exon3: there is a difference > of 2 nucleotides in that exon – see below. > > Tx=NM_000804, (chr11) > > NCBI nucleotide DB info > http://www.ncbi.nlm.nih.gov/nuccore/9257219 > ============================================= > exon1 1..44 Len=44 > exon2 45..218 Len=174 > exon3 219..407 Len=189 > exon4 408..543 Len=136 > exon5 544..847 Len=304 > > CDS 51..788 Len=738 > polyA_site 847 > > > RefSeq table downloaded from UCSC > ======================================= > exon1 len=44, exS=71524418, exE=71524462 > exon2 len=174, exS=71524640, exE=71524814 > exon3 len=187 exS=71527654, exE=71527841 <----- (len is 187 instead of 189) > exon4 len=136 exS=71528038, exE=71528174 > exon5 len=304, exS=71528278, exE=71528582 > > 5utrL=50, cdsL=736, 3utrL=59, mRNA_L=845 > ------------------------------------------------------ > > • Could you please check why for many protein-coding transcripts, the > length of the coding region is not a whole multiplication of triplets. > > • Another problem that I encountered when calculating exons’ lengths > was that in order to get the correct length (according to NCBI > nucleotide DB), one has to calculate (exonEnd – exonS) rather than > what I expected: (exonEnd – exonS +1). It seems that exonS positions > (but not exonsEnd ones) are (-1) shifted. Is this indeed the case? > > Many thanks in advance, > Rani > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
