Hi Sohini,

The duplicate entries you are seeing are from RefSeq genes that aligned 
to more than one location. There are many such entries.

Note this part of the RefSeq track description:

RefSeq RNAs were aligned against the human genome using blat; those with 
an alignment of less than 15% were discarded. When a single RNA aligned 
in multiple places, the alignment having the highest base identity was 
identified. Only alignments having a base identity level within 0.1% of 
the best and at least 96% base identity with the genomic sequence were kept.

Please contact us again at [email protected] if you have any further 
questions.

---
Luvina Guruvadoo
UCSC Genome Bioinformatics Group


On 7/13/2012 6:52 AM, sohini wrote:
> Hello,
> I have encountered a problem while extracting Refseq genomic sequence. For
> particular accession nos. There are duplicates, one in + strand, one in -
> strand, with different genomic coordinates but with same sequence.A specific
> example would be>hg19_refGene_NM_000854_0 in chr22.In chr 22 only, there are 3
> more examples, hg19_refGene_NM_000854_2 hg19_refGene_NM_000854_3
> hg19_refGene_NM_000854_1.
> Could you please explain how this might happen?
> Please reply soon.
>
> Sohini Chakraborty
> CSIR-Junior Research Fellow
> CoE in Bioinformatics
> Bose Institute
> DST, Govt. of India
>
> --
> Open WebMail Project (http://openwebmail.org)
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to