Dear Kayla, thank you very much for your quick response, the explanations and the links. They were very helpful in understanding your pipeline and the concepts of chains and nets.
However, I didn't find an explanation about the command-line parameters of liftOver and how they affect the results. I'd be extremely happy if someone could find some time to check my problems/ questions. Very many thanks in advance, Alex On Aug 21, 2009, at 1:12 AM, Kayla Smith wrote: > > Hello Alexander, > > Please see this previously answered mailing list question, "How does > LiftOver Work?": > > https://lists.soe.ucsc.edu/pipermail/genome/2008-March/015810.html > > Hopefully this should answer some of your questions. If you still > require assistance, please write back to us. > > Kayla Smith > UCSC Genome Bioinformatics Group > > ----- "Alexander Stark" <[email protected]> wrote: > >> Hi all, >> >> we're using liftOver quite extensively to translate coordinates >> between different species. In general, it seems to work quite well >> for >> >> us and the results typically make sense when we inspect them >> visually. >> >> However, sometimes we run into problems, especially for coordinate- >> conversions between more distantly related species. Unfortunately, we >> >> could not find a more detailed description of how liftOver works >> (apart from the short help it prints) and what the command line >> parameters do - we hope someone can help. >> >> It is our understanding that liftOver essentially uses the UCSC >> alignments (or the underlying data) for the conversions. This should >> >> mean that any input region can map to 0, 1, or several contiguous >> regions in the target genome, that the region length can change, and >> >> that only a certain fraction of the input nucleotides correspond to >> (i.e. map to) target nucleotides. >> >> We assume that the behavior of liftOver with respect to these can be >> >> controlled using the following parameters: >> >> -minMatch=0.N Minimum ratio of bases that must remap. Default 0.95 >> -minBlocks=0.N Minimum ratio of alignment blocks/exons that must map >> >> (default 1.00) >> -fudgeThick If thickStart/thickEnd is not mapped, use the closest >> mapped base. Recommended if using -minBlocks. >> -multiple Allow multiple output regions >> -minChainT, -minChainQ Minimum chain size in target/query, when >> mapping to multiple output regions (default 0, 0) >> >> Could you please give some details on what exactly the parameters do? >> >> This is very important for us to know in order to use the tool >> appropriately. For example: >> >> 1. What does "remap" mean for the minMatch parameter? >> Is it the fraction of input bases that have a target counterpart, >> i.e. >> >> that would appear aligned in a sequence alignment (or is it the >> fraction of target-bases that have an input counterpart)? >> >> When relaxing this parameter, we typically get more lifted regions. >> Are these however still orthologous/unique or will we run into a >> specificity problem? I understand that liftOver only uses a pre- >> computed alignment (or coordinate lookup-table) that - in principle - >> >> only contains alignments between orthologous regions. In other words, >> >> I do NOT expect liftOver to simply find more and more "matches" that >> >> make less and less sense as e.g. blast would do when lowering its >> specificity. >> >> 2. How does the minMatch parameter influence the growing & shrinking >> >> of region-length >> Does a more relaxed minMatch parameter allow for more variable >> region- >> >> length between input and target regions? In other words: if it only >> assesses the fraction of input nucleotides that have a counterpart, >> the region can grow freely but not shrink and vice versa. >> >> 3. Will we "loose" regions? >> When lowering minMatch, will regions that are uniquely mapped with a >> >> stringent minMatch parameter map to multiple regions/blocks and thus >> >> become unmapped? >> >> 4. Does "multiple" allow that an input region spans multiple output >> blocks or does it allow non-unique mapping (of the same region) >> >> 5. What does minChainT and minChainQ mean (i.e. what is a chain size, >> >> etc.)? >> >> 6. what does minBlocks do? does it apply to regions that span >> multiple >> >> alignment blocks and require that the same number of alignment blocks >> >> must be in the input and target region? >> >> Very many thanks for your help in advance and sorry for all the >> questions. >> >> Best, >> >> Alex >> >> >> >> >> >> ********** >> Alexander Stark, PhD >> Group Leader >> Institute of Molecular Pathology (IMP) >> Dr. Bohr-Gasse 7; 1030 Vienna >> Austria >> >> Tel. +43 (1) 79730-3380 >> [email protected] >> http://www.imp.ac.at/research/alexander-stark/ >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
