Hi Sohini, The duplicate entries you are seeing are from RefSeq genes that aligned to more than one location. There are many such entries.
Note this part of the RefSeq track description: RefSeq RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Please contact us again at [email protected] if you have any further questions. --- Luvina Guruvadoo UCSC Genome Bioinformatics Group On 7/13/2012 6:52 AM, sohini wrote: > Hello, > I have encountered a problem while extracting Refseq genomic sequence. For > particular accession nos. There are duplicates, one in + strand, one in - > strand, with different genomic coordinates but with same sequence.A specific > example would be>hg19_refGene_NM_000854_0 in chr22.In chr 22 only, there are 3 > more examples, hg19_refGene_NM_000854_2 hg19_refGene_NM_000854_3 > hg19_refGene_NM_000854_1. > Could you please explain how this might happen? > Please reply soon. > > Sohini Chakraborty > CSIR-Junior Research Fellow > CoE in Bioinformatics > Bose Institute > DST, Govt. of India > > -- > Open WebMail Project (http://openwebmail.org) > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
