Hi Rachel and Katrina,

Thanks for looking into this and for the extensive explanations, that's
very useful! I have one comment/suggestion below.

On 08/16/2010 09:46 AM, Rachel Harte wrote:
> Hello Hervé,
>
> I have some more information to add to the reply below after reviewing
> all the CCDS in your list below. I can confirm that all the CCDS in your
> list, that have in-frame stop codons, are for genes encoding
> selenoproteins so the TGA stop codon in these transcripts is translated
> as the amino acid, selenocysteine. At the moment, the UCSC Genome
> Browser does not recognise these TGA codons as selenocysteine codons so
> they are coloured red as if they are stop codons when you zoom in to the
> base level on the Genome Browser. The CCDS with non-ATG start codons all
> have alternative translation start codons such as CTG or GTG and there
> is experimental evidence that suggests that these alternate start codons
> are the predominant ones used for these genes. I am going to update the
> CCDS track description page on the UCSC Genome Browser to explain these
> exceptions to the criteria listed there.

Thanks for this. What about the protein sequence provided by the
Browser? If I understand correctly, it should not be truncated at the
first in-frame stop codon because this codon actually gets translated.

> Thank you for drawing our
> attention to this omission.
>
> Finally, the three CCDS that have nucleotide lengths that are divisible
> by three, are pending withdrawal from the CCDS set. This is because
> these CCDS are from genes that are known to be polymorphic and the
> reference genome allele contains a 1 nt insert and cannot encode the
> protein as the 1 nt insert causes a frameshift in translation which can
> cause the protein to be truncated and contain erroneous sequence so that
> the protein is not likely to be functional. Since CCDS is an annotation
> of the reference genome, we can not create a CCDS on the reference
> genome that encodes the normal protein for these genes.

I see. Thanks for the clarification.

>
> CCDS is constantly being reviewed for such cases and for new evidence
> that requires the CCDS to be updated. NCBI releases a new CCDS set
> periodically and then these updates come into effect. If you do see any
> other potential problems with CCDS, then please notify the CCDS group at
> [email protected]. Thank you.

I will. Thanks again for the clarifications!

Cheers,
H.

>
> Rachel
>
> On 8/13/10 3:00 PM, Katrina Learned wrote:
>> Hi Hervé,
>>
>> Thank you for your email. One of our staff members is also part of
>> CCDS project and she has offered the following information:
>>
>> CCDS43034.1 is actually a selenoprotein (SELO, selenoprotein O) and so
>> it has an in-frame stop codon because, in this protein, the in-frame
>> stop codon is translated to a selenocysteine. We are currently
>> determining if this is the case for the other CCDS you found with
>> in-frame stop codons.
>>
>> As for the CCDS without start codons, there are some CCDS that have
>> been annotated with a non-ATG start codon e.g. CTG where there is
>> experimental evidence to suggest that the protein is translated from
>> the non-ATG start codon.
>>
>> Finally, CCDS is constantly being updated, and so the project members
>> are continually reviewing CCDS and correcting any errors or updating
>> annotations based on additional evidence that becomes available. These
>> updates are released periodically.
>>
>> We are currently looking into your additional observations in more
>> detail. Please don't hesitate to contact the mail list again if you
>> have any further questions.
>>
>> Katrina Learned
>> UCSC Genome Bioinformatics Group
>>
>> Hervé Pagès wrote, On 08/13/10 12:50:
>>> Hi,
>>>
>>> According to the Methods section of the CCDS track page for hg18,
>>> one of the criteria used to assess each gene is:
>>>
>>> - an initiating ATG, a valid stop codon, and no in-frame stop codons
>>>
>>> However when using some tools to extract and translate the transcripts
>>> for all the genes in the track, I find that some of the genes fail to
>>> satisfy the criteria. More precisely:
>>>
>>> - 21 genes fail to have an initiating ATG (e.g. CCDS43136.1,
>>> CCDS34059.1, etc..., see full listing at the end of the email).
>>>
>>> - 15 genes fail to have no in-frame stop codons. E.g. the
>>> CCDS43034.1 gene (on chr22 strand +) has an in-frame stop
>>> codon 9 base upstream the stop codon located at the position
>>> specified in the cdsEnd column of the ccdsGene table for
>>> that gene.
>>>
>>> When using the Genome Browser to display CCDS43136.1 and CCDS43034.1
>>> for hg18, I can *see* a confirmation of the problem. But if I click on
>>> the CCDS43034.1 gene and then follow the link to the protein sequence
>>> then the sequence is truncated at the in-frame stop codon, not at the
>>> stop codon located at ccdsGene.cdsEnd. So I'm wondering why isn't
>>> ccdsGene.cdsEnd set to the end of the effective stop codon?
>>>
>>> For hg19, the situation is slightly worse. In addition to having genes
>>> with the same problems as reported above, 3 genes have a cumulated
>>> CDS length that is not even a multiple of 3 (CCDS47664.1, CCDS47663.1
>>> and CCDS45377.1).
>>>
>>> I would be very thankful if someone could provide some insight about
>>> this.
>>>
>>> Thanks,
>>> H.
>>>
>>> Full listing of failing genes for hg18:
>>> - without an initiating ATG:
>>> CCDS43136.1, CCDS34059.1, CCDS43376.1, CCDS34458.1, CCDS34457.1,
>>> CCDS34737.1, CCDS6359.2, CCDS35004.1, CCDS35044.1, CCDS7878.2,
>>> CCDS7877.2, CCDS41618.1, CCDS31428.1, CCDS31730.1, CCDS31729.1,
>>> CCDS42102.1, CCDS32514.1, CCDS33104.1, CCDS33460.1, CCDS33646.1,
>>> CCDS33647.1
>>> - with one or more in-frame stop codons:
>>> CCDS41340.1, CCDS41339.1, CCDS41283.1, CCDS41282.1, CCDS43091.1,
>>> CCDS43389.1, CCDS43432.1, CCDS41964.1, CCDS41992.1, CCDS42100.1,
>>> CCDS42150.1, CCDS42457.1, CCDS42981.1, CCDS43003.1, CCDS43034.1
>>>
>> _______________________________________________
>> Genome maillist - [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to