Hello Yunfei,
Unfortunately, not all genes map to only one place in the genome. Here 
is an excerpt from the RefGene description that explains our criteria 
for selection in these instances:
"When a single RNA aligned in multiple places, the alignment having the 
highest base identity was identified. Only alignments having a base 
identity level within 0.1% of the best and at least 96% base identity 
with the genomic sequence were kept."
We cannot advise you on which is the 'best' mapping of a gene that maps 
multiple times, since we consider all of them valid.

Other gene sets, such as the UCSC gene set, have unique identifiers for 
each mapping of a gene. Within this gene set, there are two tables, 
knownIsoforms and knownCanonical, that correspond to all the genes or a 
single representative of each cluster of genes, respectively. Depending 
on what your needs are, it's possible you may be able to use one of 
these instead.

I hope this clears things up for you.

Best
Antonio Coelho
UCSC Genome Bioinformatics Group

Li, Yunfei wrote:
> Hello,
>
> I downloaded the file "upstream1000.fa.gz" - Sequences 1000 bases upstream of 
> annotated transcription starts for RefSeq genes with annotated 5' UTRs. It 
> seems sometime one NM name may have multiple kinds of sequence, if they show 
> up on different location on same or different chromosome, for example 
> "NM_175342,have 3 kinds and all from chr14;NM_023052 have 6 kinds, 2 from 
> chrUn_random, and 4 from chr4". 
>
> If I want to leave only one sequence for each NM name(since the sequence 
> analyze software I am using need so), how can I decide which one to leave 
> would make the most sense?
>
> Best,
>
> Yunfei Li
> --------------------------------------------------------------------------------------
> Research Assistant
> Department of Statistics &
> School of Molecular Biosciences
> Biotechnology Life Sciences Building 427
> Washington State University
> Pullman, WA 99164-7520
> Phone: 509-339-5096
> http://www.wsu.edu/~ye_lab/people.html
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to