Hi all,

we're using liftOver quite extensively to translate coordinates  
between different species. In general, it seems to work quite well for  
us and the results typically make sense when we inspect them visually.  
However, sometimes we run into problems, especially for coordinate- 
conversions between more distantly related species. Unfortunately, we  
could not find a more detailed description of how liftOver works  
(apart from the short help it prints) and what the command line  
parameters do - we hope someone can help.

It is our understanding that liftOver essentially uses the UCSC  
alignments (or the underlying data) for the conversions. This should  
mean that any input region can map to 0, 1, or several contiguous  
regions in the target genome, that the region length can change, and  
that only a certain fraction of the input nucleotides correspond to  
(i.e. map to) target nucleotides.

We assume that the behavior of liftOver with respect to these can be  
controlled using the following parameters:

-minMatch=0.N   Minimum ratio of bases that must remap. Default 0.95
-minBlocks=0.N  Minimum ratio of alignment blocks/exons that must map  
(default 1.00)
-fudgeThick             If thickStart/thickEnd is not mapped, use the closest  
mapped base.  Recommended if using -minBlocks.
-multiple               Allow multiple output regions
-minChainT, -minChainQ  Minimum chain size in target/query, when  
mapping to multiple output regions (default 0, 0)

Could you please give some details on what exactly the parameters do?  
This is very important for us to know in order to use the tool  
appropriately. For example:

1. What does "remap" mean for the minMatch parameter?
Is it the fraction of input bases that have a target counterpart, i.e.  
that would appear aligned in a sequence alignment (or is it the  
fraction of target-bases that have an input counterpart)?

When relaxing this parameter, we typically get more lifted regions.  
Are these however still orthologous/unique or will we run into a  
specificity problem? I understand that liftOver only uses a pre- 
computed alignment (or coordinate lookup-table) that - in principle -  
only contains alignments between orthologous regions. In other words,  
I do NOT expect liftOver to simply find more and more "matches" that  
make less and less sense as e.g. blast would do when lowering its  
specificity.

2. How does the minMatch parameter influence the growing & shrinking  
of region-length
Does a more relaxed minMatch parameter allow for more variable region- 
length between input and target regions? In other words: if it only  
assesses the fraction of input nucleotides that have a counterpart,  
the region can grow freely but not shrink and vice versa.

3. Will we "loose" regions?
When lowering minMatch, will regions that are uniquely mapped with a  
stringent minMatch parameter map to multiple regions/blocks and thus  
become unmapped?

4. Does "multiple" allow that an input region spans multiple output  
blocks or does it allow non-unique mapping (of the same region)

5. What does minChainT and minChainQ mean (i.e. what is a chain size,  
etc.)?

6. what does minBlocks do? does it apply to regions that span multiple  
alignment blocks and require that the same number of alignment blocks  
must be in the input and target region?

Very many thanks for your help in advance and sorry for all the  
questions.

Best,

Alex





**********
Alexander Stark, PhD
Group Leader
Institute of Molecular Pathology (IMP)
Dr. Bohr-Gasse 7; 1030 Vienna
Austria

Tel. +43 (1) 79730-3380
[email protected]
http://www.imp.ac.at/research/alexander-stark/


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to