Hello Yunfei, Unfortunately, not all genes map to only one place in the genome. Here is an excerpt from the RefGene description that explains our criteria for selection in these instances: "When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept." We cannot advise you on which is the 'best' mapping of a gene that maps multiple times, since we consider all of them valid.
Other gene sets, such as the UCSC gene set, have unique identifiers for each mapping of a gene. Within this gene set, there are two tables, knownIsoforms and knownCanonical, that correspond to all the genes or a single representative of each cluster of genes, respectively. Depending on what your needs are, it's possible you may be able to use one of these instead. I hope this clears things up for you. Best Antonio Coelho UCSC Genome Bioinformatics Group Li, Yunfei wrote: > Hello, > > I downloaded the file "upstream1000.fa.gz" - Sequences 1000 bases upstream of > annotated transcription starts for RefSeq genes with annotated 5' UTRs. It > seems sometime one NM name may have multiple kinds of sequence, if they show > up on different location on same or different chromosome, for example > "NM_175342,have 3 kinds and all from chr14;NM_023052 have 6 kinds, 2 from > chrUn_random, and 4 from chr4". > > If I want to leave only one sequence for each NM name(since the sequence > analyze software I am using need so), how can I decide which one to leave > would make the most sense? > > Best, > > Yunfei Li > -------------------------------------------------------------------------------------- > Research Assistant > Department of Statistics & > School of Molecular Biosciences > Biotechnology Life Sciences Building 427 > Washington State University > Pullman, WA 99164-7520 > Phone: 509-339-5096 > http://www.wsu.edu/~ye_lab/people.html > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
