Hi, I downloaded the chromosomal sequences (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/...) and the Database files (ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/) for knownGene.txt and knownGeneMrna.txt from UCSC. Using the chromosomal locations for the exons using knownGene.txt I extracted the mRNA Sequences for the knownGenes and compared them to the sequences in knownGeneMrna.txt. Unfortunately about 1/4 of the sequences differ in single nucleotide mutations
substitution: uc008wki.1 ...cctcctAtactggagct... ...cctcctGtactggagct... or different exon start/end positions: start: uc008wjb.1 cggcgtgggactgggagtccgtcc... gcgtgggactgggagtccgtccgg... end: uc008wkk.1 ...gatttttttaaccataaaaaaaaaaaaaaaaaaaaaaaaaa ...gatttttttaaccata Can anyone please explain these differences and/or give me a hint which data to use (I'm looking for motifs in the processed mRNA). Many Thanks. Marten -- Marten Jäger, Msc Bioinformatik Charité - Universitätsmedizin Berlin Campus Virchow Klinikum Institut für Medizinische Genetik und Humangenetik Augustenburger Platz 1 13353 Berlin Germany phone: +49/30/450 569135 email: [email protected] http://genetik.charite.de/institut/ http://compbio.charite.de _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
