Hello Hervé,

I have some more information to add to the reply below after reviewing 
all the CCDS in your list below. I can confirm that all the CCDS in your 
list, that have in-frame stop codons, are for genes encoding 
selenoproteins so the TGA stop codon in these transcripts is translated 
as the amino acid, selenocysteine. At the moment, the UCSC Genome 
Browser does not recognise these TGA codons as selenocysteine codons so 
they are coloured red as if they are stop codons when you zoom in to the 
base level on the Genome Browser. The CCDS with non-ATG start codons all 
have alternative translation start codons such as CTG or GTG and there 
is experimental evidence that suggests that these alternate start codons 
are the predominant ones used for these genes. I am going to update the 
CCDS track description page on the UCSC Genome Browser to explain these 
exceptions to the criteria listed there. Thank you for drawing our 
attention to this omission.

Finally, the three CCDS that have nucleotide lengths that are divisible 
by three, are pending withdrawal from the CCDS set. This is because 
these CCDS are from genes that are known to be polymorphic and the 
reference genome allele contains a 1 nt insert and cannot encode the 
protein as the 1 nt insert causes a frameshift in translation which can 
cause the protein to be truncated and contain erroneous sequence so that 
the protein is not likely to be functional. Since CCDS is an annotation 
of the reference genome, we can not create a CCDS on the reference 
genome that encodes the normal protein for these genes.

CCDS is constantly being reviewed for such cases and for new evidence 
that requires the CCDS to be updated. NCBI releases a new CCDS set 
periodically and then these updates come into effect. If you do see any 
other potential problems with CCDS, then please notify the CCDS group at 
[email protected]. Thank you.

Rachel

On 8/13/10 3:00 PM, Katrina Learned wrote:
> Hi Hervé,
>
> Thank you for your email. One of our staff members is also part of CCDS 
> project and she has offered the following information:
>
> CCDS43034.1 is actually a selenoprotein (SELO, selenoprotein O) and so 
> it has an in-frame stop codon because, in this protein, the in-frame 
> stop codon is translated to a selenocysteine. We are currently 
> determining if this is the case for the other CCDS you found with 
> in-frame stop codons.
>
> As for the CCDS without start codons, there are some CCDS that have been 
> annotated with a non-ATG start codon e.g. CTG where there is 
> experimental evidence to suggest that the protein is translated from the 
> non-ATG start codon.
>
> Finally, CCDS is constantly being updated, and so the project members 
> are continually reviewing CCDS and correcting any errors or updating 
> annotations based on additional evidence that becomes available. These 
> updates are released periodically.
>
> We are currently looking into your additional observations in more 
> detail. Please don't hesitate to contact the mail list again if you have 
> any further questions.
>
> Katrina Learned
> UCSC Genome Bioinformatics Group
>
> Hervé Pagès wrote, On 08/13/10 12:50:
>   
>> Hi,
>>
>> According to the Methods section of the CCDS track page for hg18,
>> one of the criteria used to assess each gene is:
>>
>>    - an initiating ATG, a valid stop codon, and no in-frame stop codons
>>
>> However when using some tools to extract and translate the transcripts
>> for all the genes in the track, I find that some of the genes fail to
>> satisfy the criteria. More precisely:
>>
>>    - 21 genes fail to have an initiating ATG (e.g. CCDS43136.1,
>>      CCDS34059.1, etc..., see full listing at the end of the email).
>>
>>    - 15 genes fail to have no in-frame stop codons. E.g. the
>>      CCDS43034.1 gene (on chr22 strand +) has an in-frame stop
>>      codon 9 base upstream the stop codon located at the position
>>      specified in the cdsEnd column of the ccdsGene table for
>>      that gene.
>>
>> When using the Genome Browser to display CCDS43136.1 and CCDS43034.1
>> for hg18, I can *see* a confirmation of the problem. But if I click on
>> the CCDS43034.1 gene and then follow the link to the protein sequence
>> then the sequence is truncated at the in-frame stop codon, not at the
>> stop codon located at ccdsGene.cdsEnd. So I'm wondering why isn't
>> ccdsGene.cdsEnd set to the end of the effective stop codon?
>>
>> For hg19, the situation is slightly worse. In addition to having genes
>> with the same problems as reported above, 3 genes have a cumulated
>> CDS length that is not even a multiple of 3 (CCDS47664.1, CCDS47663.1
>> and CCDS45377.1).
>>
>> I would be very thankful if someone could provide some insight about
>> this.
>>
>> Thanks,
>> H.
>>
>> Full listing of failing genes for hg18:
>>    - without an initiating ATG:
>>        CCDS43136.1, CCDS34059.1, CCDS43376.1, CCDS34458.1, CCDS34457.1,
>>        CCDS34737.1, CCDS6359.2, CCDS35004.1, CCDS35044.1, CCDS7878.2,
>>        CCDS7877.2, CCDS41618.1, CCDS31428.1, CCDS31730.1, CCDS31729.1,
>>        CCDS42102.1, CCDS32514.1, CCDS33104.1, CCDS33460.1, CCDS33646.1,
>>        CCDS33647.1
>>    - with one or more in-frame stop codons:
>>        CCDS41340.1, CCDS41339.1, CCDS41283.1, CCDS41282.1, CCDS43091.1,
>>        CCDS43389.1, CCDS43432.1, CCDS41964.1, CCDS41992.1, CCDS42100.1,
>>        CCDS42150.1, CCDS42457.1, CCDS42981.1, CCDS43003.1, CCDS43034.1
>>
>>   
>>     
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   

-- 
Rachel Harte, Ph.D.
Bioinformatics Engineer
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to