Hi Xue, I'd like to clarify my answer from Monday.
My explanation from Monday is true in general for the tables/files that are part of our RefSeq Genes track. However, the refMrna.fa file is different because it isn't actually part of the RefSeq Genes track, it is the input used to create the RefSeq Genes track. The refMrna.fa file contains mRNA sequences (both the Human RefSeq coding (NM) mRNA and non coding (NR) RNA sequences) from the Reference Sequence collection at NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/). It is updated once a week. The sequences in this file are incorporated into the "RefSeq Genes" track (basically, as I mentioned above, the refMrna.fa file is the input used to create the RefSeq Genes track). So, the refMrna.fa file contains sequences that align to multiple locations and those that don't align at all. Once we align all the sequences in this file, we apply the parameters stated in the RefSeq Genes methods section to the alignments to determine which alignments will remain as part of the track: "RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept." I apologize that my first response didn't include this additional information. Please contact the mail list ([email protected]) again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group On 10/3/11 7:29 PM, Katrina Learned wrote: > Hi Xue, > > Are you referring to the following paragraph from the methods section of > the RefSeq Genes track (from which the refMrna.fa file is from)? > > "RefSeq RNAs were aligned against the human genome using blat; those > with an alignment of less than 15% were discarded. When a single RNA > aligned in multiple places, the alignment having the highest base > identity was identified. Only alignments having a base identity level > within 0.1% of the best and at least 96% base identity with the genomic > sequence were kept." > > Although only the single best alignment for each RefSeq entry is > included in the track, RefSeq itself contains multiple entries for genes > that have multiple known splice variants. So, the RefSeq Genes track > does contain splice variants. > > Please contact the mail list ([email protected]) again if you have any > further questions. > > Katrina Learned > UCSC Genome Bioinformatics Group > > > > On 10/3/11 9:19 AM, xue lin wrote: >> Dear whom concerned, >> Thank you for last time answering the question, and I still have a little >> quesiton that I read the introduction of the refMrna.fa, there is no >> redundancy in the file. Does it mean that there is no splicing sequences in >> the mRNA refseq file? >> Thank you very much. >> All the best. >> Xue >> _______________________________________________ >> Genome maillist [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ > Genome maillist [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
