Hello jlw,

Looking at the two protein sequences send in the previous question in 
this thread they seem to diverge right before the end of the first exon 
so I wonder if your program isn't parsing the exon/intron boundaries 
correctly?

Another issue which may affect coordinate calculation - does your 
software take into account UCSCs 0-based start and 1-based end 
coordinate system? Please see this FAQ for more information:

http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1

Hopefully this information was helpful and answers your question. If you 
have further questions or require clarification feel free to contact the 
mailing list at [email protected].

Best regards,

Pauline Fujita

UCSC Genome Bioinformatics Group
http://genome.ucsc.edu




On 10/13/10 5:09 AM, James Lyons-Weiler wrote:
> mary...
>
> this use of cds start issue has been very confusing to us here.  maybe you
> can help with additional details.
>
> what does it mean to 'use the cds start as the start codon'... in terms of
> algorithms, please? do you mean a literal translation from that codon,
> whether the 1st triplet of the cds is atg or not?
>
> what are the consequences of using the cds start as the start codon when
> the transcription start codon in known and annotated and should be used,
> instead?
>
> mary, fyi we are using our own translator, not any ucsc software.  is the
> ucsc software programmed to anticipate the cds start as the 'start' codon
> but still return the translation of the annotated transcript or something?
>
> jlw
> director
> bioinformatics analysis core
> pitt
>
>
>   
>> Hi Mary,
>>
>> Thanks for the answer but I would like to know why the transaltion result
>> for many genes come out to be different when CDS start mentioned in USCS
>> Genome Browser Table is considered to be the translation start?
>> For example I downloaded a complete chromosome 1 of hg19 from UCSC Genome
>> Browser ftp downloads whole genome and then I carefully extracted the all
>> the exonic regions starting from base at CDS start till base at CDS end
>> for a gene/transcript(uc010nya.1), the exon start and end positions and
>> CDS start and end obtained from UCSC. Then I translated those regions
>> assuming that reading frame begins from CDS start(translation start) and
>> the string of amino acids differ from the protein sequence of uc010nya.1
>> obtained from UCSC Genome Browser.
>>
>> The two sequences are:
>>
>>     
>>> Manual translation from CDS start till CDS end for uc010nya.1
>>>       
>> MSESRQTHVTLHDIDPQALDQLVQFAYTAEIVVGEGNVQDSAPSRQSPAA
>> EWRPRRLLQVSTESARPLQLPGYPGLCRCALLQRPAQGRPQVRAAALRGR
>> GQDRGVYAAAPETGNSWRAQPSXXXXXXXXLCL*LPTPFCS*HSPAHNP*
>> CLLCVPETFLDLGPPGASSVAPDSARPLPV*TLSPHLLTX
>>
>>     
>>> uc010nya.1 obtained from UCSC Genome Browser table
>>>       
>> MSESRQTHVTLHDIDPQALDQLVQFAYTAEIVVGEGNVQTLLPAASLLQLNGVRDACCKF
>> LLSQLDPSNCLGIRGFADAHSCSDLLKAAHRYVLQHFVDVAKTEEFMLLPLKQVTAGGPS
>> PRPPPHPTPVFVFDSRPRFVPDTALPTILSACCVSPRPFWIWAPQEPRLWLLTLLGPSQY
>> EHSAPTC
>>
>> From first line of the first sequence after "GNVQ" you will start seeing
>> the deviation from the second sequence.
>>
>> Please let me know why does it then differ.
>>
>> Thank you,
>> Rahil Sethi
>>
>>     
>>> Hi Rahil,
>>>
>>> Thank you so much for giving the assembly, track and table you were
>>> using when you encountered your question - it is much appreciated!
>>>
>>> UCSC Genes does not have cdsStartStat, cdsEndStat or exonFrames fields
>>> like most of our gene prediction tracks (more information about why can
>>> be found in this previous mailing list question:
>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-September/023585.html).
>>> This means that you can use the CDS start and CDS end as start and stop
>>> codons. Please keep in mind that we have made the CDS start equal the
>>> CDS end for non-coding genes.
>>>
>>> I hope this information is helpful.  Please feel free to contact the
>>> mail list again if you require further assistance.
>>>
>>> Best,
>>> Mary
>>> ------------------
>>> Mary Goldman
>>> UCSC Bioinformatics Group
>>>
>>> On 10/11/10 7:29 AM, [email protected] wrote:
>>>       
>>>> Hello,
>>>>
>>>> I am trying to extract the codon start and codon stop for a set of
>>>> genes
>>>> in a given position, from Tables in UCSC Genome Browser. Whenever I
>>>> click
>>>> output for Genes and Gene Predictions in a chromosome posiition range,
>>>> it
>>>> gives me all the feature of genes like exon start, exon stop, CDS
>>>> start,
>>>> CDS stop, but does not give me the codon start (start position of the
>>>> first codon i.e. translation start) and codon stop (position of stop
>>>> codon
>>>> i.e. translation stop).
>>>>
>>>> Please let me know how can I get this information?
>>>>
>>>> I am using:
>>>> Genome: Hg19
>>>> Group: Genes and Gene Prediction Tracks
>>>> Track: UCSC Genes
>>>> Table: KnownGene
>>>> region: defined regions
>>>>
>>>> Thank you,
>>>> Rahil Sethi
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>>         
>>     
>
>
> --
> Thank you very much,
>
> James Lyons-Weiler
>
>
> Director, Bioinformatics Analysis Core
> Genomics and Proteomics Core Laboratories
> Department of Biomedical Informatics
> University of Pittsburgh Cancer Institute
> 3rd Floor
> 3343 Forbes Avenue
> Pittsburgh, PA 15260
> phone: 412-728-8743
> reply-to: [email protected]
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to