Hi Piroon, I asked the engineer here who created the pipeline that maps RefSeq genes to the genome, and this is what he had to say:
--- The RefGene track uses blat, with more sophisticated filter of the results. Here is what I see for these examples: - NR_031679 - The top two hits: > QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END > SPAN > > ----------------------------------------------------------------------------------- > NR_031679.1 118 1 118 118 100.0% 17 - 13387571 > 13387688 118 > NR_031679.1 81 12 101 118 95.6% 16 - 66647192 > 66647282 91 are discarded by the RefSeq alignment pipeline because the majority of the sequence aligned to repeats in the repeat masker track. However the RefGene alignment of NR_031679 doesn't look very good either. Lower identity and intron sized gaps without valid splice sites. - NR_031644 and NR_030317 show similar alignment to repeats when blatted. I think are mappings are wrong and the problem is that RepeatMasker is identifying these as repetitive regions of Mariner/MADE1. It will require more research to determine exactly what is going on. RepeatMasker does identify some elements that we should not be filtering against when aligning non-coding RNAs. We don't have an immediate fix, but I have added these to my list of problem cases. --- Thank you for pointing this out to us and providing examples. I don't know whether or when the pipeline might change so that these kinds of cases are improved. In the meantime, the sno/miRNA track will probably be more useful for you. -- Brooke Rhead UCSC Genome Bioinformatics Group Piroon J. wrote on 11/1/10 12:47 AM: > Dear Sir/Admins, > > I have a question and request for your RefSeq (non-coding genes) mapping > in hg18, UCSC Browser. > I'm working on microRNA topic. The question is about mapping of RefSeq > microRNA. Some of RefSeq microRNA seem to be wrongly mapped on hg18. I > could give you a few RefSeq wrongly mapped e.g. NR_031679, NR_031644, > NR_030317, etc. > > NR_031679 (MIR548H3) was mapped at chr1:105130114-105476802 while > hsa-mir-548h-3 was mapped at chr17:13387571-13387688 (proved by blat). > NR_031644 (MIR548F3) was mapped at chr1:213146072-213334022 while > hsa-mir-548f-3 was mapped at chr5:109877429-109877515 (proved by blat). > NR_030317 (MIR548A2) was mapped at chr13:78340250-78743319 while > hsa-mir-548a-2 was mapped at chr6:135601991-135602087 (proved by blat). > > Would you like to fix these problem? These make me confuse and wonder > how reliable is this mapping of RefSeq non-coding genes. > > Look forward to hearing from you. Any suggestions will be appreciated. > > Thank you for your time. > > Yours sincerely, > Piroon > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
