Hello jlw, Looking at the two protein sequences send in the previous question in this thread they seem to diverge right before the end of the first exon so I wonder if your program isn't parsing the exon/intron boundaries correctly?
Another issue which may affect coordinate calculation - does your software take into account UCSCs 0-based start and 1-based end coordinate system? Please see this FAQ for more information: http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1 Hopefully this information was helpful and answers your question. If you have further questions or require clarification feel free to contact the mailing list at [email protected]. Best regards, Pauline Fujita UCSC Genome Bioinformatics Group http://genome.ucsc.edu On 10/13/10 5:09 AM, James Lyons-Weiler wrote: > mary... > > this use of cds start issue has been very confusing to us here. maybe you > can help with additional details. > > what does it mean to 'use the cds start as the start codon'... in terms of > algorithms, please? do you mean a literal translation from that codon, > whether the 1st triplet of the cds is atg or not? > > what are the consequences of using the cds start as the start codon when > the transcription start codon in known and annotated and should be used, > instead? > > mary, fyi we are using our own translator, not any ucsc software. is the > ucsc software programmed to anticipate the cds start as the 'start' codon > but still return the translation of the annotated transcript or something? > > jlw > director > bioinformatics analysis core > pitt > > > >> Hi Mary, >> >> Thanks for the answer but I would like to know why the transaltion result >> for many genes come out to be different when CDS start mentioned in USCS >> Genome Browser Table is considered to be the translation start? >> For example I downloaded a complete chromosome 1 of hg19 from UCSC Genome >> Browser ftp downloads whole genome and then I carefully extracted the all >> the exonic regions starting from base at CDS start till base at CDS end >> for a gene/transcript(uc010nya.1), the exon start and end positions and >> CDS start and end obtained from UCSC. Then I translated those regions >> assuming that reading frame begins from CDS start(translation start) and >> the string of amino acids differ from the protein sequence of uc010nya.1 >> obtained from UCSC Genome Browser. >> >> The two sequences are: >> >> >>> Manual translation from CDS start till CDS end for uc010nya.1 >>> >> MSESRQTHVTLHDIDPQALDQLVQFAYTAEIVVGEGNVQDSAPSRQSPAA >> EWRPRRLLQVSTESARPLQLPGYPGLCRCALLQRPAQGRPQVRAAALRGR >> GQDRGVYAAAPETGNSWRAQPSXXXXXXXXLCL*LPTPFCS*HSPAHNP* >> CLLCVPETFLDLGPPGASSVAPDSARPLPV*TLSPHLLTX >> >> >>> uc010nya.1 obtained from UCSC Genome Browser table >>> >> MSESRQTHVTLHDIDPQALDQLVQFAYTAEIVVGEGNVQTLLPAASLLQLNGVRDACCKF >> LLSQLDPSNCLGIRGFADAHSCSDLLKAAHRYVLQHFVDVAKTEEFMLLPLKQVTAGGPS >> PRPPPHPTPVFVFDSRPRFVPDTALPTILSACCVSPRPFWIWAPQEPRLWLLTLLGPSQY >> EHSAPTC >> >> From first line of the first sequence after "GNVQ" you will start seeing >> the deviation from the second sequence. >> >> Please let me know why does it then differ. >> >> Thank you, >> Rahil Sethi >> >> >>> Hi Rahil, >>> >>> Thank you so much for giving the assembly, track and table you were >>> using when you encountered your question - it is much appreciated! >>> >>> UCSC Genes does not have cdsStartStat, cdsEndStat or exonFrames fields >>> like most of our gene prediction tracks (more information about why can >>> be found in this previous mailing list question: >>> https://lists.soe.ucsc.edu/pipermail/genome/2010-September/023585.html). >>> This means that you can use the CDS start and CDS end as start and stop >>> codons. Please keep in mind that we have made the CDS start equal the >>> CDS end for non-coding genes. >>> >>> I hope this information is helpful. Please feel free to contact the >>> mail list again if you require further assistance. >>> >>> Best, >>> Mary >>> ------------------ >>> Mary Goldman >>> UCSC Bioinformatics Group >>> >>> On 10/11/10 7:29 AM, [email protected] wrote: >>> >>>> Hello, >>>> >>>> I am trying to extract the codon start and codon stop for a set of >>>> genes >>>> in a given position, from Tables in UCSC Genome Browser. Whenever I >>>> click >>>> output for Genes and Gene Predictions in a chromosome posiition range, >>>> it >>>> gives me all the feature of genes like exon start, exon stop, CDS >>>> start, >>>> CDS stop, but does not give me the codon start (start position of the >>>> first codon i.e. translation start) and codon stop (position of stop >>>> codon >>>> i.e. translation stop). >>>> >>>> Please let me know how can I get this information? >>>> >>>> I am using: >>>> Genome: Hg19 >>>> Group: Genes and Gene Prediction Tracks >>>> Track: UCSC Genes >>>> Table: KnownGene >>>> region: defined regions >>>> >>>> Thank you, >>>> Rahil Sethi >>>> _______________________________________________ >>>> Genome maillist - [email protected] >>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>>> >>>> >> > > > -- > Thank you very much, > > James Lyons-Weiler > > > Director, Bioinformatics Analysis Core > Genomics and Proteomics Core Laboratories > Department of Biomedical Informatics > University of Pittsburgh Cancer Institute > 3rd Floor > 3343 Forbes Avenue > Pittsburgh, PA 15260 > phone: 412-728-8743 > reply-to: [email protected] > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
