Hi Vikram, Thank you for bringing this to our attention. As it turns out blat, which is used to align the refSeq transcripts to the genome, trims polyA tails. However, in this case, blat was trimming the ends of the stop codons (TAA and TGA), thus they were not correctly aligning to the genome. We are looking into a fix for this bug.
Best, Mary --------------------- Mary Goldman UCSC Bioinformatics Group On 8/5/10 11:00 PM, Vikram Katju wrote: > I wish to draw your attention to the discrepancies I find in the CDS file > for human data (please see attached screenshots to see the settings i used > to download this file). I find that the length of the coding region of the > following accessions is not a multiple of three. In other words, it is > incomplete. My manual check tells me that the terminal exons in CDS file are > missing one or two bases at the end. However, there could be other > variations to the theme. My random checks for some of the entries in the > corresponding NCBI file shows no discrepancy. I am listing some of the > entries from Xchromosome below but it is likely that this problem exists > even for other entries on other chromosomes. > > Also, I find that some entries have been annotated on both the strands (Eg: > NM_001079538). > > Please have a look and do the needful. > > > > NM_001101357 > NM_001136234 > NM_138702 > NM_001004486 > NM_003868 > NM_005193 > NM_001007524 > NM_001013627 > NM_033380 > NM_001136273 > NM_001011719 > NM_001007523 > NM_001079538 > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
