Hi,
According to the Methods section of the CCDS track page for hg18,
one of the criteria used to assess each gene is:
- an initiating ATG, a valid stop codon, and no in-frame stop codons
However when using some tools to extract and translate the transcripts
for all the genes in the track, I find that some of the genes fail to
satisfy the criteria. More precisely:
- 21 genes fail to have an initiating ATG (e.g. CCDS43136.1,
CCDS34059.1, etc..., see full listing at the end of the email).
- 15 genes fail to have no in-frame stop codons. E.g. the
CCDS43034.1 gene (on chr22 strand +) has an in-frame stop
codon 9 base upstream the stop codon located at the position
specified in the cdsEnd column of the ccdsGene table for
that gene.
When using the Genome Browser to display CCDS43136.1 and CCDS43034.1
for hg18, I can *see* a confirmation of the problem. But if I click on
the CCDS43034.1 gene and then follow the link to the protein sequence
then the sequence is truncated at the in-frame stop codon, not at the
stop codon located at ccdsGene.cdsEnd. So I'm wondering why isn't
ccdsGene.cdsEnd set to the end of the effective stop codon?
For hg19, the situation is slightly worse. In addition to having genes
with the same problems as reported above, 3 genes have a cumulated
CDS length that is not even a multiple of 3 (CCDS47664.1, CCDS47663.1
and CCDS45377.1).
I would be very thankful if someone could provide some insight about
this.
Thanks,
H.
Full listing of failing genes for hg18:
- without an initiating ATG:
CCDS43136.1, CCDS34059.1, CCDS43376.1, CCDS34458.1, CCDS34457.1,
CCDS34737.1, CCDS6359.2, CCDS35004.1, CCDS35044.1, CCDS7878.2,
CCDS7877.2, CCDS41618.1, CCDS31428.1, CCDS31730.1, CCDS31729.1,
CCDS42102.1, CCDS32514.1, CCDS33104.1, CCDS33460.1, CCDS33646.1,
CCDS33647.1
- with one or more in-frame stop codons:
CCDS41340.1, CCDS41339.1, CCDS41283.1, CCDS41282.1, CCDS43091.1,
CCDS43389.1, CCDS43432.1, CCDS41964.1, CCDS41992.1, CCDS42100.1,
CCDS42150.1, CCDS42457.1, CCDS42981.1, CCDS43003.1, CCDS43034.1
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: [email protected]
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome