Hi Marten, The differences you are seeing are definitely expected.
The sequence found at ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/... is the mouse reference genome sequence, and it came from sequencing mouse DNA. The sequence in knownGeneMrna.txt is based mRNA and protein sequence from several sources (click on the blue "UCSC Genes" link on http://genome.ucsc.edu/cgi-bin/hgTracks to read more about how this file was created). The knownGeneMrna sequence is aligned to the genomic sequence using BLAT. The single base differences are SNPs, and the different exon start/end positions are a result of mRNA sequence not aligning to the genome, for instance, when there is a polyA tail on the mRNA. If you need mRNA sequence, I suggest using the knownGeneMrna.txt sequence rather than the genomic sequence. I hope this is helpful. If you have further questions, please feel free to contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 02/07/11 05:00, Marten Jäger wrote: > Hi, > > I downloaded the chromosomal sequences > (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/...) and the > Database files (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/) > for knownGene.txt and knownGeneMrna.txt from UCSC. Using the chromosomal > locations for the exons using knownGene.txt I extracted the mRNA > Sequences for the knownGenes and compared them to the sequences in > knownGeneMrna.txt. Unfortunately about 1/4 of the sequences differ in > single nucleotide mutations > > substitution: uc008wki.1 > > ...cctcctAtactggagct... > ...cctcctGtactggagct... > > or different exon start/end positions: > > start: uc008wjb.1 > > cggcgtgggactgggagtccgtcc... > gcgtgggactgggagtccgtccgg... > > end: uc008wkk.1 > > ...gatttttttaaccataaaaaaaaaaaaaaaaaaaaaaaaaa > ...gatttttttaaccata > > > Can anyone please explain these differences and/or give me a hint which > data to use (I'm looking for motifs in the processed mRNA). > > Many Thanks. > > Marten > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
