Hi, Sebastian! We have tracked down the reason for this liftOver behavior.
The same-species liftover files are chains that have been netted. The netting process means that the Zv7 side is entirely single-coverage. Therefore duplications in Zv8 will not show up in the liftOver mapping. The netter will have picked the best region or arbitrarily chosen one if they are equally good. A Zv7 region that was simply split up in Zv8 could produce multiple output rows with multiple checked, but that's becase the split region is still single-coverage on the Zv7 side. Thus splitting is entirely different from duplication. Although you do get two pieces out of the both processes, they are otherwise different. If you need all duplicated regions, then the recommendation is to simply use BLAT or some other tool to realign your 60bp sequences to Zv8. This should be quick and easy. More facts about chains and nets: If one had an un-netted version of the chains and used -multiple, one would see both duplicate regions in the output as you were expecting to see. However, netting is generally desirable for other reasons. Netting is desirable when going across species. The self-chain files are usually NOT netted. -Galt 2/25/2010 4:13 PM, Sebastian Hoersch: > Hi, > > After using the liftOver tool (web version) between Zebrafish assemblies Zv7 > and Zv8 for several thousand 60mer sequences, spot-checking with BLAT > revealed instances of perfect matches missing from the liftOver results as > detailed below. > I can imagine reasons why this effect may be to be expected at some frequency > (e.g. due to minimum length requirements for a genomic region in one assembly > to correspond to a region in another), but I have not been able to find > detailed enough documentation on the liftOver utility to make a qualified > assessment in this regard. > > I would be grateful for more information on the liftOver tool that can shed > light on this issue. > > Thanks much and kind regards > Sebastian > > _____________________________________________________ > Sebastian Hoersch > Koch Institute for Integrative Cancer Research at MIT > Bioinformatics and Computing Core > 77 Massachusetts Avenue (E18-366) > Cambridge, MA 02139 > phone: 1-617-324-1728 > email: [email protected] > > > Examples: > > (1) Sequence with one perfect BLAT match in Zv7 and two perfect BLAT matches > in Zv8 – liftOver returns only one match (despite allowing multiple output > regions): > >> P07475181 > AAAAAATGTAAATAACGTGGGAAAAATCCTTGTTAAATTGTTAACGTGATCCTTGCTGAA > > Zv7:BLAT Search Results > ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND > START END SPAN > --------------------------------------------------------------------------------------------------- > browser details P07475181 60 1 60 60 100.0% 25 - 11982433 > 11982492 60 > > Zv8: BLAT Search Results > ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND > START END SPAN > --------------------------------------------------------------------------------------------------- > browser details P07475181 60 1 60 60 100.0% 25 - 21776926 > 21776985 60 > browser details P07475181 60 1 60 60 100.0% 25 - 21878781 > 21878840 60 > > > liftOver results with checkbox “Allow multiple output regions:” checked (BED > format): > chr25 21776926 21776985 - 1 > > Note: Based on SelfChain data, it appears that genomic sequence surrounding > the query sequence is duplicated in Zv8, but not in Zv7. > > (2) Sequence with one perfect BLAT match in Zv7 and in Zv8 – liftOver fails > with “#Deleted in new” > >> P01179900 > AAGAAACTGAGAGAACAAATCAAGGAAAAAAATGACAATCTGCAGAGAGAGAACTTCCAT > > Zv7:BLAT Search Results > ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND > START END SPAN > --------------------------------------------------------------------------------------------------- > browser details P01179900 60 1 60 60 100.0% 1 + 28694772 > 28694831 60 > browser details P01179900 27 1 31 60 93.6% Zv7_scaffold2625 > - 179464 179494 31 > > Zv8:BLAT Search Results > ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND > START END SPAN > --------------------------------------------------------------------------------------------------- > browser details P01179900 60 1 60 60 100.0% 1 - 35253773 > 35253832 60 > browser details P01179900 27 1 31 60 93.6% 14 - 7435368 > 7435398 31 > browser details P01179900 27 1 31 60 93.6% Zv8_scaffold1706 > - 179464 179494 31 > browser details P01179900 22 9 40 60 84.4% 2 + 43019472 > 43019503 32 > browser details P01179900 20 29 50 60 95.5% 10 + 42695956 > 42695977 22 > > liftOver results in failure: > #Deleted in new > chr1:28694772-28694831 > == > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
