Hi, Sebastian!

We have tracked down the reason for this liftOver behavior.

The same-species liftover files are chains
that have been netted.  The netting process
means that the Zv7 side is entirely single-coverage.

Therefore duplications in Zv8 will not show
up in the liftOver mapping.  The netter will
have picked the best region or arbitrarily
chosen one if they are equally good.

A Zv7 region that was simply split up in Zv8
could produce multiple output rows with multiple checked,
but that's becase the split region is still
single-coverage on the Zv7 side.

Thus splitting is entirely different from duplication.
Although you do get two pieces out of the both processes,
they are otherwise different.

If you need all duplicated regions, then
the recommendation is to simply use BLAT or some
other tool to realign your 60bp sequences to Zv8.
This should be quick and easy.

More facts about chains and nets:

If one had an un-netted version of the chains
and used -multiple, one would see both duplicate regions
in the output as you were expecting to see.
However, netting is generally desirable for other reasons.

Netting is desirable when going across species.

The self-chain files are usually NOT netted.

-Galt

2/25/2010 4:13 PM, Sebastian Hoersch:
> Hi,
>
> After using the liftOver tool (web version) between Zebrafish assemblies Zv7 
> and Zv8 for several thousand 60mer sequences, spot-checking with BLAT 
> revealed instances of perfect matches missing from the liftOver results as 
> detailed below.
> I can imagine reasons why this effect may be to be expected at some frequency 
> (e.g. due to minimum length requirements for a genomic region in one assembly 
> to correspond to a region in another), but I have not been able to find 
> detailed enough documentation on the liftOver utility to make a qualified 
> assessment in this regard.
>
> I would be grateful for more information on the liftOver tool that can shed 
> light on this issue.
>
> Thanks much and kind regards
> Sebastian
>
> _____________________________________________________
> Sebastian Hoersch
> Koch Institute for Integrative Cancer Research at MIT
> Bioinformatics and Computing Core
> 77 Massachusetts Avenue (E18-366)
> Cambridge, MA 02139
> phone: 1-617-324-1728
> email: [email protected]
>
>
> Examples:
>
> (1) Sequence with one perfect BLAT match in Zv7 and two perfect BLAT matches 
> in Zv8 – liftOver returns only one match (despite allowing multiple output 
> regions):
>
>> P07475181
> AAAAAATGTAAATAACGTGGGAAAAATCCTTGTTAAATTGTTAACGTGATCCTTGCTGAA
>
> Zv7:BLAT Search Results
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
> START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser details P07475181    60     1    60    60 100.0%    25   -   11982433 
>  11982492     60
>
> Zv8: BLAT Search Results
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
> START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser details P07475181    60     1    60    60 100.0%    25   -   21776926 
>  21776985     60
> browser details P07475181    60     1    60    60 100.0%    25   -   21878781 
>  21878840     60
>
>
> liftOver results with checkbox “Allow multiple output regions:” checked (BED 
> format):
> chr25 21776926        21776985        -       1
>
> Note: Based on SelfChain data, it appears that genomic sequence surrounding 
> the query sequence is duplicated in Zv8, but not in Zv7.
>
> (2) Sequence with one perfect BLAT match in Zv7 and in Zv8 – liftOver fails 
> with “#Deleted in new”
>
>> P01179900
> AAGAAACTGAGAGAACAAATCAAGGAAAAAAATGACAATCTGCAGAGAGAGAACTTCCAT
>
> Zv7:BLAT Search Results
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
> START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser details P01179900    60     1    60    60 100.0%     1   +   28694772 
>  28694831     60
> browser details P01179900    27     1    31    60  93.6%  Zv7_scaffold2625   
> -     179464    179494     31
>
> Zv8:BLAT Search Results
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
> START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser details P01179900    60     1    60    60 100.0%     1   -   35253773 
>  35253832     60
> browser details P01179900    27     1    31    60  93.6%    14   -    7435368 
>   7435398     31
> browser details P01179900    27     1    31    60  93.6%  Zv8_scaffold1706   
> -     179464    179494     31
> browser details P01179900    22     9    40    60  84.4%     2   +   43019472 
>  43019503     32
> browser details P01179900    20    29    50    60  95.5%    10   +   42695956 
>  42695977     22
>
> liftOver results in failure:
> #Deleted in new
> chr1:28694772-28694831
> ==
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to