Hi Hervé, Thank you for suggestions. I have entered requests for fixes to these issues. The requests will be reviewed and assessed by our management.
Please don't hesitate to contact the mail list again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group Hervé Pagès wrote, On 8/16/2010 12:37 PM: > Hi Rachel and Katrina, > > Thanks for looking into this and for the extensive explanations, that's > very useful! I have one comment/suggestion below. > > On 08/16/2010 09:46 AM, Rachel Harte wrote: > >> Hello Hervé, >> >> I have some more information to add to the reply below after reviewing >> all the CCDS in your list below. I can confirm that all the CCDS in your >> list, that have in-frame stop codons, are for genes encoding >> selenoproteins so the TGA stop codon in these transcripts is translated >> as the amino acid, selenocysteine. At the moment, the UCSC Genome >> Browser does not recognise these TGA codons as selenocysteine codons so >> they are coloured red as if they are stop codons when you zoom in to the >> base level on the Genome Browser. The CCDS with non-ATG start codons all >> have alternative translation start codons such as CTG or GTG and there >> is experimental evidence that suggests that these alternate start codons >> are the predominant ones used for these genes. I am going to update the >> CCDS track description page on the UCSC Genome Browser to explain these >> exceptions to the criteria listed there. >> > > Thanks for this. What about the protein sequence provided by the > Browser? If I understand correctly, it should not be truncated at the > first in-frame stop codon because this codon actually gets translated. > > >> Thank you for drawing our >> attention to this omission. >> >> Finally, the three CCDS that have nucleotide lengths that are divisible >> by three, are pending withdrawal from the CCDS set. This is because >> these CCDS are from genes that are known to be polymorphic and the >> reference genome allele contains a 1 nt insert and cannot encode the >> protein as the 1 nt insert causes a frameshift in translation which can >> cause the protein to be truncated and contain erroneous sequence so that >> the protein is not likely to be functional. Since CCDS is an annotation >> of the reference genome, we can not create a CCDS on the reference >> genome that encodes the normal protein for these genes. >> > > I see. Thanks for the clarification. > > >> CCDS is constantly being reviewed for such cases and for new evidence >> that requires the CCDS to be updated. NCBI releases a new CCDS set >> periodically and then these updates come into effect. If you do see any >> other potential problems with CCDS, then please notify the CCDS group at >> [email protected]. Thank you. >> > > I will. Thanks again for the clarifications! > > Cheers, > H. > > >> Rachel >> >> On 8/13/10 3:00 PM, Katrina Learned wrote: >> >>> Hi Hervé, >>> >>> Thank you for your email. One of our staff members is also part of >>> CCDS project and she has offered the following information: >>> >>> CCDS43034.1 is actually a selenoprotein (SELO, selenoprotein O) and so >>> it has an in-frame stop codon because, in this protein, the in-frame >>> stop codon is translated to a selenocysteine. We are currently >>> determining if this is the case for the other CCDS you found with >>> in-frame stop codons. >>> >>> As for the CCDS without start codons, there are some CCDS that have >>> been annotated with a non-ATG start codon e.g. CTG where there is >>> experimental evidence to suggest that the protein is translated from >>> the non-ATG start codon. >>> >>> Finally, CCDS is constantly being updated, and so the project members >>> are continually reviewing CCDS and correcting any errors or updating >>> annotations based on additional evidence that becomes available. These >>> updates are released periodically. >>> >>> We are currently looking into your additional observations in more >>> detail. Please don't hesitate to contact the mail list again if you >>> have any further questions. >>> >>> Katrina Learned >>> UCSC Genome Bioinformatics Group >>> >>> Hervé Pagès wrote, On 08/13/10 12:50: >>> >>>> Hi, >>>> >>>> According to the Methods section of the CCDS track page for hg18, >>>> one of the criteria used to assess each gene is: >>>> >>>> - an initiating ATG, a valid stop codon, and no in-frame stop codons >>>> >>>> However when using some tools to extract and translate the transcripts >>>> for all the genes in the track, I find that some of the genes fail to >>>> satisfy the criteria. More precisely: >>>> >>>> - 21 genes fail to have an initiating ATG (e.g. CCDS43136.1, >>>> CCDS34059.1, etc..., see full listing at the end of the email). >>>> >>>> - 15 genes fail to have no in-frame stop codons. E.g. the >>>> CCDS43034.1 gene (on chr22 strand +) has an in-frame stop >>>> codon 9 base upstream the stop codon located at the position >>>> specified in the cdsEnd column of the ccdsGene table for >>>> that gene. >>>> >>>> When using the Genome Browser to display CCDS43136.1 and CCDS43034.1 >>>> for hg18, I can *see* a confirmation of the problem. But if I click on >>>> the CCDS43034.1 gene and then follow the link to the protein sequence >>>> then the sequence is truncated at the in-frame stop codon, not at the >>>> stop codon located at ccdsGene.cdsEnd. So I'm wondering why isn't >>>> ccdsGene.cdsEnd set to the end of the effective stop codon? >>>> >>>> For hg19, the situation is slightly worse. In addition to having genes >>>> with the same problems as reported above, 3 genes have a cumulated >>>> CDS length that is not even a multiple of 3 (CCDS47664.1, CCDS47663.1 >>>> and CCDS45377.1). >>>> >>>> I would be very thankful if someone could provide some insight about >>>> this. >>>> >>>> Thanks, >>>> H. >>>> >>>> Full listing of failing genes for hg18: >>>> - without an initiating ATG: >>>> CCDS43136.1, CCDS34059.1, CCDS43376.1, CCDS34458.1, CCDS34457.1, >>>> CCDS34737.1, CCDS6359.2, CCDS35004.1, CCDS35044.1, CCDS7878.2, >>>> CCDS7877.2, CCDS41618.1, CCDS31428.1, CCDS31730.1, CCDS31729.1, >>>> CCDS42102.1, CCDS32514.1, CCDS33104.1, CCDS33460.1, CCDS33646.1, >>>> CCDS33647.1 >>>> - with one or more in-frame stop codons: >>>> CCDS41340.1, CCDS41339.1, CCDS41283.1, CCDS41282.1, CCDS43091.1, >>>> CCDS43389.1, CCDS43432.1, CCDS41964.1, CCDS41992.1, CCDS42100.1, >>>> CCDS42150.1, CCDS42457.1, CCDS42981.1, CCDS43003.1, CCDS43034.1 >>>> >>>> >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
