Hi Jennifer,

many thanks for the detailed explanations, I think that I now  
understand how it works and where to be careful. We'll run some tests  
on the specific species and regions we're dealing with.

I have to admit though that I didn't understand your comment about the  
Conservation track: looking e.g. at Drosophila melanogster, the  
Conservation track (full view) contains the phastCons track and a  
graphical representation of the alignment blocks. In the Table  
Browser, the Conservation track contains the tables: phastCons,  
multiz15waySummary, multiz15wayFrames, and multiz15way. I'm not sure  
how I would use them for an automated genome-wide homology mapping  
between the species (with all the caveats regarding homology/orthology  
you mentioned). Or did you anyway recommend this only for a manual  
analyzes of a few regions?

Many thanks and best wishes,

Alex






On Aug 21, 2009, at 8:08 PM, Jennifer Jackson wrote:

> Hi Alex,
>
> Using liftOver for cross-species comparisions can be tricky. For  
> each pair, it may take a few passes to get all alignments, making  
> the parameters more permissive, and perhaps ranking the results  
> based on how strict the parameters were.
>
> For same species, our defaults are minmatch=0.9 and multiple=N
> This means that there must be at least a 90% identity to a single  
> location in order for the mapping to work.
>
> For cross-species, our defaults are minmatch=0.1 and multiple=Y
> This means that there must be at least a 10% identity and multiple  
> locations may be reported.
>
> The other paramaters that you mention are defined - there is not  
> much to add. Increasing the "chain" length for either query or  
> target will filter out shorter alignments, etc.
>
> The files that liftvOver are based on (chains) are not evaluated to  
> be confirmed syntentic reagions. They are only ranked by probability  
> (see the chain file documentation in the link from Kayla). And the  
> genes assciated with these regions are certainly not confirmed  
> orthologs. They could be paralogs or homologues or orthologs  But as  
> you know, ortholog implies similiar function, which is a more  
> complicated question and is frequently not a 1-1 relationship nor  
> identifiable by sequence holomogy for more distant species. So, you  
> are correct to suspicious of the "more hits" you are getting.
>
> For a different view that incorporates more factors (evolutionay  
> distance, for example) use the Conservation track. It is better  
> suited for cross-species syntentic analysis. Then, if you find  
> sequences that you believe are orthologs, and they also have strong  
> syntenic evidence, that would give you perhaps more confidence in  
> that ortholog for evolutionarily close. It is not true all true  
> orthologs will be syntenic, especially as the evolutionary distance  
> increases.
>
> Sorry that there is no exact answer for your question. The liftOver  
> utility is just a tool. Experimenting with paramaters and then  
> evaluating the results from a scientific perpective is the  
> recommended anlaysis path. And each species pair - and even each  
> genomic location - may need special handling. LiftOver is explicity  
> not recommended for cross-species comparisions in the documentation,  
> but is it used anyway by many for a first look, especially for  
> species with close evolutionary distance.
>
> Good luck,
> Jennifer Jackson
>
> ------------------------------------------------
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
> ----- "Alexander Stark" <[email protected]> wrote:
>
>> From: "Alexander Stark" <[email protected]>
>> To: "Kayla Smith" <[email protected]>
>> Cc: [email protected]
>> Sent: Friday, August 21, 2009 7:22:12 AM GMT -08:00 US/Canada Pacific
>> Subject: Re: [Genome] (advanced) liftOver questions
>>
>> Dear Kayla,
>>
>> thank you very much for your quick response, the explanations and the
>>
>> links. They were very helpful in understanding your pipeline and the
>>
>> concepts of chains and nets.
>>
>> However, I didn't find an explanation about the command-line
>> parameters of liftOver and how they affect the results. I'd be
>> extremely happy if someone could find some time to check my problems/
>>
>> questions.
>>
>> Very many thanks in advance,
>>
>> Alex
>>
>>
>>
>>
>> On Aug 21, 2009, at 1:12 AM, Kayla Smith wrote:
>>
>>>
>>> Hello Alexander,
>>>
>>> Please see this previously answered mailing list question, "How does
>>
>>> LiftOver Work?":
>>>
>>> https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015810.html
>>>
>>> Hopefully this should answer some of your questions. If you still
>>> require assistance, please write back to us.
>>>
>>> Kayla Smith
>>> UCSC Genome Bioinformatics Group
>>>
>>> ----- "Alexander Stark" <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> we're using liftOver quite extensively to translate coordinates
>>>> between different species. In general, it seems to work quite well
>>
>>>> for
>>>>
>>>> us and the results typically make sense when we inspect them
>>>> visually.
>>>>
>>>> However, sometimes we run into problems, especially for
>> coordinate-
>>>> conversions between more distantly related species. Unfortunately,
>> we
>>>>
>>>> could not find a more detailed description of how liftOver works
>>>> (apart from the short help it prints) and what the command line
>>>> parameters do - we hope someone can help.
>>>>
>>>> It is our understanding that liftOver essentially uses the UCSC
>>>> alignments (or the underlying data) for the conversions. This
>> should
>>>>
>>>> mean that any input region can map to 0, 1, or several contiguous
>>>> regions in the target genome, that the region length can change,
>> and
>>>>
>>>> that only a certain fraction of the input nucleotides correspond
>> to
>>>> (i.e. map to) target nucleotides.
>>>>
>>>> We assume that the behavior of liftOver with respect to these can
>> be
>>>>
>>>> controlled using the following parameters:
>>>>
>>>> -minMatch=0.N      Minimum ratio of bases that must remap. Default 0.95
>>>> -minBlocks=0.N     Minimum ratio of alignment blocks/exons that must
>> map
>>>>
>>>> (default 1.00)
>>>> -fudgeThick                If thickStart/thickEnd is not mapped, use the 
>>>> closest
>>>> mapped base.  Recommended if using -minBlocks.
>>>> -multiple               Allow multiple output regions
>>>> -minChainT, -minChainQ  Minimum chain size in target/query, when
>>>> mapping to multiple output regions (default 0, 0)
>>>>
>>>> Could you please give some details on what exactly the parameters
>> do?
>>>>
>>>> This is very important for us to know in order to use the tool
>>>> appropriately. For example:
>>>>
>>>> 1. What does "remap" mean for the minMatch parameter?
>>>> Is it the fraction of input bases that have a target counterpart,
>>
>>>> i.e.
>>>>
>>>> that would appear aligned in a sequence alignment (or is it the
>>>> fraction of target-bases that have an input counterpart)?
>>>>
>>>> When relaxing this parameter, we typically get more lifted
>> regions.
>>>> Are these however still orthologous/unique or will we run into a
>>>> specificity problem? I understand that liftOver only uses a pre-
>>>> computed alignment (or coordinate lookup-table) that - in principle
>> -
>>>>
>>>> only contains alignments between orthologous regions. In other
>> words,
>>>>
>>>> I do NOT expect liftOver to simply find more and more "matches"
>> that
>>>>
>>>> make less and less sense as e.g. blast would do when lowering its
>>>> specificity.
>>>>
>>>> 2. How does the minMatch parameter influence the growing &
>> shrinking
>>>>
>>>> of region-length
>>>> Does a more relaxed minMatch parameter allow for more variable
>>>> region-
>>>>
>>>> length between input and target regions? In other words: if it
>> only
>>>> assesses the fraction of input nucleotides that have a
>> counterpart,
>>>> the region can grow freely but not shrink and vice versa.
>>>>
>>>> 3. Will we "loose" regions?
>>>> When lowering minMatch, will regions that are uniquely mapped with
>> a
>>>>
>>>> stringent minMatch parameter map to multiple regions/blocks and
>> thus
>>>>
>>>> become unmapped?
>>>>
>>>> 4. Does "multiple" allow that an input region spans multiple
>> output
>>>> blocks or does it allow non-unique mapping (of the same region)
>>>>
>>>> 5. What does minChainT and minChainQ mean (i.e. what is a chain
>> size,
>>>>
>>>> etc.)?
>>>>
>>>> 6. what does minBlocks do? does it apply to regions that span
>>>> multiple
>>>>
>>>> alignment blocks and require that the same number of alignment
>> blocks
>>>>
>>>> must be in the input and target region?
>>>>
>>>> Very many thanks for your help in advance and sorry for all the
>>>> questions.
>>>>
>>>> Best,
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> **********
>>>> Alexander Stark, PhD
>>>> Group Leader
>>>> Institute of Molecular Pathology (IMP)
>>>> Dr. Bohr-Gasse 7; 1030 Vienna
>>>> Austria
>>>>
>>>> Tel. +43 (1) 79730-3380
>>>> [email protected]
>>>> http://www.imp.ac.at/research/alexander-stark/
>>>>
>>>>
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to