Hi,

After using the liftOver tool (web version) between Zebrafish assemblies Zv7 
and Zv8 for several thousand 60mer sequences, spot-checking with BLAT revealed 
instances of perfect matches missing from the liftOver results as detailed 
below.
I can imagine reasons why this effect may be to be expected at some frequency 
(e.g. due to minimum length requirements for a genomic region in one assembly 
to correspond to a region in another), but I have not been able to find 
detailed enough documentation on the liftOver utility to make a qualified 
assessment in this regard.

I would be grateful for more information on the liftOver tool that can shed 
light on this issue.

Thanks much and kind regards
Sebastian

_____________________________________________________
Sebastian Hoersch
Koch Institute for Integrative Cancer Research at MIT
Bioinformatics and Computing Core
77 Massachusetts Avenue (E18-366)
Cambridge, MA 02139
phone: 1-617-324-1728
email: [email protected]


Examples:

(1) Sequence with one perfect BLAT match in Zv7 and two perfect BLAT matches in 
Zv8 – liftOver returns only one match (despite allowing multiple output 
regions):

>P07475181
AAAAAATGTAAATAACGTGGGAAAAATCCTTGTTAAATTGTTAACGTGATCCTTGCTGAA

Zv7:BLAT Search Results
   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details P07475181    60     1    60    60 100.0%    25   -   11982433  
11982492     60

Zv8: BLAT Search Results
   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details P07475181    60     1    60    60 100.0%    25   -   21776926  
21776985     60
browser details P07475181    60     1    60    60 100.0%    25   -   21878781  
21878840     60


liftOver results with checkbox “Allow multiple output regions:” checked (BED 
format):
chr25   21776926        21776985        -       1

Note: Based on SelfChain data, it appears that genomic sequence surrounding the 
query sequence is duplicated in Zv8, but not in Zv7.

(2) Sequence with one perfect BLAT match in Zv7 and in Zv8 – liftOver fails 
with “#Deleted in new”

>P01179900
AAGAAACTGAGAGAACAAATCAAGGAAAAAAATGACAATCTGCAGAGAGAGAACTTCCAT

Zv7:BLAT Search Results
   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details P01179900    60     1    60    60 100.0%     1   +   28694772  
28694831     60
browser details P01179900    27     1    31    60  93.6%  Zv7_scaffold2625   -  
   179464    179494     31

Zv8:BLAT Search Results
   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details P01179900    60     1    60    60 100.0%     1   -   35253773  
35253832     60
browser details P01179900    27     1    31    60  93.6%    14   -    7435368   
7435398     31
browser details P01179900    27     1    31    60  93.6%  Zv8_scaffold1706   -  
   179464    179494     31
browser details P01179900    22     9    40    60  84.4%     2   +   43019472  
43019503     32
browser details P01179900    20    29    50    60  95.5%    10   +   42695956  
42695977     22

liftOver results in failure:
#Deleted in new
chr1:28694772-28694831
==

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to