Hello,

You are downloading the entire mRna sequence - so that includes the 5' 
and 3' UTR. The mRna is extracted from the RefSeq record at Genbank. To 
trim the nucleotide sequence to only represent the coding region, use 
the information in the primary table knownGene (cdsStart/End). Be sure 
to use the coordinates correctly for minus strand genes (-).

Coordinate help:
http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

Using the Table browser, if you chose to export the genomic sequence 
instead, and selected CDS only, both the start and stop codons would be 
included in the output.

Hopefully this helps,
Jennifer


---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/

On 2/23/10 10:52 AM, Lee Sande wrote:
> Hi,
> I've downloaded the hg18 CDS sequence as follows:
> assembly: mar 2006
> group: genes and gene prediction track
> track: refseq genes
> table: refGene
> and then
> output format: sequence
> and then select mRNA on the following page
>
> I notice that many cds sequences do not start with ATG and this does not
> seem to
> be due to strand.
> I also notice that many cds sequences do not end with the three canonical
> stop sites (TAG, TGA,TAA)
>
> I am computing codon bias which is extremely sensitive to frame ship so I
> want
> to be absolutely sure that I have the right codons.
>
> Can you please help me understand how I can go about processing this data so
> that
> I have the right frame (even though say the exact CDS start or the stop site
> is unknown)
>
> --Thanks
> Lee
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to