Hi Hervé, Thank you for your email. One of our staff members is also part of CCDS project and she has offered the following information:
CCDS43034.1 is actually a selenoprotein (SELO, selenoprotein O) and so it has an in-frame stop codon because, in this protein, the in-frame stop codon is translated to a selenocysteine. We are currently determining if this is the case for the other CCDS you found with in-frame stop codons. As for the CCDS without start codons, there are some CCDS that have been annotated with a non-ATG start codon e.g. CTG where there is experimental evidence to suggest that the protein is translated from the non-ATG start codon. Finally, CCDS is constantly being updated, and so the project members are continually reviewing CCDS and correcting any errors or updating annotations based on additional evidence that becomes available. These updates are released periodically. We are currently looking into your additional observations in more detail. Please don't hesitate to contact the mail list again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group Hervé Pagès wrote, On 08/13/10 12:50: > Hi, > > According to the Methods section of the CCDS track page for hg18, > one of the criteria used to assess each gene is: > > - an initiating ATG, a valid stop codon, and no in-frame stop codons > > However when using some tools to extract and translate the transcripts > for all the genes in the track, I find that some of the genes fail to > satisfy the criteria. More precisely: > > - 21 genes fail to have an initiating ATG (e.g. CCDS43136.1, > CCDS34059.1, etc..., see full listing at the end of the email). > > - 15 genes fail to have no in-frame stop codons. E.g. the > CCDS43034.1 gene (on chr22 strand +) has an in-frame stop > codon 9 base upstream the stop codon located at the position > specified in the cdsEnd column of the ccdsGene table for > that gene. > > When using the Genome Browser to display CCDS43136.1 and CCDS43034.1 > for hg18, I can *see* a confirmation of the problem. But if I click on > the CCDS43034.1 gene and then follow the link to the protein sequence > then the sequence is truncated at the in-frame stop codon, not at the > stop codon located at ccdsGene.cdsEnd. So I'm wondering why isn't > ccdsGene.cdsEnd set to the end of the effective stop codon? > > For hg19, the situation is slightly worse. In addition to having genes > with the same problems as reported above, 3 genes have a cumulated > CDS length that is not even a multiple of 3 (CCDS47664.1, CCDS47663.1 > and CCDS45377.1). > > I would be very thankful if someone could provide some insight about > this. > > Thanks, > H. > > Full listing of failing genes for hg18: > - without an initiating ATG: > CCDS43136.1, CCDS34059.1, CCDS43376.1, CCDS34458.1, CCDS34457.1, > CCDS34737.1, CCDS6359.2, CCDS35004.1, CCDS35044.1, CCDS7878.2, > CCDS7877.2, CCDS41618.1, CCDS31428.1, CCDS31730.1, CCDS31729.1, > CCDS42102.1, CCDS32514.1, CCDS33104.1, CCDS33460.1, CCDS33646.1, > CCDS33647.1 > - with one or more in-frame stop codons: > CCDS41340.1, CCDS41339.1, CCDS41283.1, CCDS41282.1, CCDS43091.1, > CCDS43389.1, CCDS43432.1, CCDS41964.1, CCDS41992.1, CCDS42100.1, > CCDS42150.1, CCDS42457.1, CCDS42981.1, CCDS43003.1, CCDS43034.1 > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
