Hi Xue,

I'd like to clarify my answer from Monday.

My explanation from Monday is true in general for the tables/files that 
are part of our RefSeq Genes track. However, the refMrna.fa file is 
different because it isn't actually part of the RefSeq Genes track, it 
is the input used to create the RefSeq Genes track.

The refMrna.fa file contains mRNA sequences (both the Human RefSeq 
coding (NM) mRNA and non coding (NR) RNA sequences) from the Reference 
Sequence collection at NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/). It is 
updated once a week. The sequences in this file are incorporated into 
the "RefSeq Genes" track (basically, as I mentioned above, the 
refMrna.fa file is the input used to create the RefSeq Genes track). So, 
the refMrna.fa file contains sequences that align to multiple locations 
and those that don't align at all. Once we align all the sequences in 
this file, we apply the parameters stated in the RefSeq Genes methods 
section to the alignments to determine which alignments will remain as 
part of the track:

"RefSeq RNAs were aligned against the human genome using blat; those
with an alignment of less than 15% were discarded. When a single RNA
aligned in multiple places, the alignment having the highest base
identity was identified. Only alignments having a base identity level
within 0.1% of the best and at least 96% base identity with the genomic
sequence were kept."


I apologize that my first response didn't include this additional 
information.

Please contact the mail list ([email protected]) again if you have any 
further questions.

Katrina Learned
UCSC Genome Bioinformatics Group


On 10/3/11 7:29 PM, Katrina Learned wrote:
> Hi Xue,
>
> Are you referring to the following paragraph from the methods section of
> the RefSeq Genes track (from which the refMrna.fa file is from)?
>
> "RefSeq RNAs were aligned against the human genome using blat; those
> with an alignment of less than 15% were discarded. When a single RNA
> aligned in multiple places, the alignment having the highest base
> identity was identified. Only alignments having a base identity level
> within 0.1% of the best and at least 96% base identity with the genomic
> sequence were kept."
>
> Although only the single best alignment for each RefSeq entry is
> included in the track, RefSeq itself contains multiple entries for genes
> that have multiple known splice variants. So, the RefSeq Genes track
> does contain splice variants.
>
> Please contact the mail list ([email protected]) again if you have any
> further questions.
>
> Katrina Learned
> UCSC Genome Bioinformatics Group
>
>
>
> On 10/3/11 9:19 AM, xue lin wrote:
>> Dear whom concerned,
>> Thank you for last time answering the question, and I still have a little
>> quesiton that I read the introduction of the refMrna.fa, there is no
>> redundancy in the file. Does it mean that there is no splicing sequences in
>> the mRNA refseq file?
>> Thank you very much.
>> All the best.
>> Xue
>> _______________________________________________
>> Genome maillist  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> _______________________________________________
> Genome maillist  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to