Hello, You are downloading the entire mRna sequence - so that includes the 5' and 3' UTR. The mRna is extracted from the RefSeq record at Genbank. To trim the nucleotide sequence to only represent the coding region, use the information in the primary table knownGene (cdsStart/End). Be sure to use the coordinates correctly for minus strand genes (-).
Coordinate help: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms Using the Table browser, if you chose to export the genomic sequence instead, and selected CDS only, both the start and stop codons would be included in the output. Hopefully this helps, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 2/23/10 10:52 AM, Lee Sande wrote: > Hi, > I've downloaded the hg18 CDS sequence as follows: > assembly: mar 2006 > group: genes and gene prediction track > track: refseq genes > table: refGene > and then > output format: sequence > and then select mRNA on the following page > > I notice that many cds sequences do not start with ATG and this does not > seem to > be due to strand. > I also notice that many cds sequences do not end with the three canonical > stop sites (TAG, TGA,TAA) > > I am computing codon bias which is extremely sensitive to frame ship so I > want > to be absolutely sure that I have the right codons. > > Can you please help me understand how I can go about processing this data so > that > I have the right frame (even though say the exact CDS start or the stop site > is unknown) > > --Thanks > Lee > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
