Hi Piroon,

I asked the engineer here who created the pipeline that maps RefSeq 
genes to the genome, and this is what he had to say:

---
The RefGene track uses blat, with more sophisticated filter of the 
results. Here is what I see for these examples:

- NR_031679 - The top two hits:

> QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END     
>  SPAN
>     
> -----------------------------------------------------------------------------------
>     NR_031679.1      118     1   118   118 100.0%    17   -   13387571  
> 13387688    118
>     NR_031679.1       81    12   101   118  95.6%    16   -   66647192  
> 66647282     91


are discarded by the RefSeq alignment pipeline because the majority of 
the sequence aligned to repeats in the repeat masker track.   However 
the RefGene alignment of NR_031679 doesn't look very good either.  Lower 
identity and intron sized gaps without valid splice sites.

- NR_031644 and NR_030317 show similar alignment to repeats when blatted.

I think are mappings are wrong and the problem is that RepeatMasker is 
identifying these as repetitive regions of Mariner/MADE1. It will 
require more research to determine exactly what is going on. 
RepeatMasker does identify some elements that we should not be filtering 
against when aligning non-coding RNAs. We don't have an immediate fix, 
but I have added these to my list of problem cases.
---

Thank you for pointing this out to us and providing examples.  I don't 
know whether or when the pipeline might change so that these kinds of 
cases are improved.  In the meantime, the sno/miRNA track will probably 
be more useful for you.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Piroon J. wrote on 11/1/10 12:47 AM:
> Dear Sir/Admins,
> 
> I have a question and request for your RefSeq (non-coding genes) mapping 
> in hg18, UCSC Browser.
> I'm working on microRNA topic. The question is about mapping of RefSeq 
> microRNA. Some of RefSeq microRNA seem to be wrongly mapped on hg18. I 
> could give you a few RefSeq wrongly mapped e.g. NR_031679, NR_031644, 
> NR_030317, etc.
> 
> NR_031679 (MIR548H3) was mapped at chr1:105130114-105476802 while 
> hsa-mir-548h-3 was mapped at chr17:13387571-13387688 (proved by blat).
> NR_031644 (MIR548F3) was mapped at chr1:213146072-213334022 while 
> hsa-mir-548f-3 was mapped at chr5:109877429-109877515 (proved by blat).
> NR_030317 (MIR548A2) was mapped at chr13:78340250-78743319 while 
> hsa-mir-548a-2 was mapped at chr6:135601991-135602087 (proved by blat).
> 
> Would you like to fix these problem? These make me confuse and wonder 
> how reliable is this mapping of RefSeq non-coding genes.
> 
> Look forward to hearing from you. Any suggestions will be appreciated.
> 
> Thank you for your time.
> 
> Yours sincerely,
> Piroon
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to