Hi Marten,

The differences you are seeing are definitely expected.

The sequence found at 
ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/... is the 
mouse reference genome sequence, and it came from sequencing mouse DNA. 
  The sequence in knownGeneMrna.txt is based mRNA and protein sequence 
from several sources (click on the blue "UCSC Genes" link on 
http://genome.ucsc.edu/cgi-bin/hgTracks to read more about how this file 
was created).  The knownGeneMrna sequence is aligned to the genomic 
sequence using BLAT.  The single base differences are SNPs, and the 
different exon start/end positions are a result of mRNA sequence not 
aligning to the genome, for instance, when there is a polyA tail on the 
mRNA.

If you need mRNA sequence, I suggest using the knownGeneMrna.txt 
sequence rather than the genomic sequence.

I hope this is helpful.  If you have further questions, please feel free 
to contact us again at [email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group




On 02/07/11 05:00, Marten Jäger wrote:
> Hi,
> 
> I downloaded the chromosomal sequences 
> (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/...) and the 
> Database files (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/) 
> for knownGene.txt and knownGeneMrna.txt from UCSC. Using the chromosomal 
> locations for the exons using knownGene.txt I extracted the mRNA 
> Sequences for the knownGenes and compared them to the sequences in 
> knownGeneMrna.txt. Unfortunately about 1/4 of the sequences differ in 
> single nucleotide mutations
> 
> substitution: uc008wki.1
> 
> ...cctcctAtactggagct...
> ...cctcctGtactggagct...
> 
> or different exon start/end positions:
> 
> start: uc008wjb.1
> 
> cggcgtgggactgggagtccgtcc...
>    gcgtgggactgggagtccgtccgg...
> 
> end: uc008wkk.1
> 
> ...gatttttttaaccataaaaaaaaaaaaaaaaaaaaaaaaaa
> ...gatttttttaaccata
> 
> 
> Can anyone please explain these differences and/or give me a hint which 
> data to use (I'm looking for motifs in the processed mRNA).
> 
> Many Thanks.
> 
> Marten
> 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to