Hi Michael, great to see some discussion on this topic.
I'll try to give some answers. The boundaries of the homologous segments
are defined by an HMM on a pairwise basis among all genomes. The process
is described in a few of the publications, e.g. Treangen et al 2009 and
Darling et al 2010. Importantly, the boundaries end up being a bit
sloppy after taking the unions of pairwise homology predictions, so
the .backbone file can contain some small pieces that may or may not
actually be conserved. The command-line accessible parameters
controlling this process are --hmm-p-go-homologous,
--hmm-p-go-unrelated, --hmm-identity, and --island-gap-size. There's a
few more parameters that are currently only accessible via the source
code, such as the HMM posterior probability threshold used to call
homology. I've tried to set all these to sensible defaults, but it's
likely they could benefit from fine tuning in some cases.

In terms of genome evolution inference, it's a fascinating and
challenging problem, where there is often great uncertainty regarding
the individual events in likelihood based models. The data are generally
not informative enough to distinguish among alternative event histories,
but as was demonstrated in the Darling et al 2008 paper on yersinia
genome rearrangements, common features among alternative histories can
be identified as well supported. I haven't tried MLGO, but if you'd like
to give it a go, you might be interested in the (experimental) new
feature in Mauve 2.4.0 which allows one to export a permutation matrix
via the GUI. I'm not sure the format will match what MLGO expects, but
subset blocks are represented in the matrix exported via the Mauve GUI.
However, keep in mind that progressiveMauve and other positional
homology genome aligners are not aligning or identifying all the various
homologous regions within individual genomes, so you won't get
duplication history inference. If you try this I'd be keen to hear about
how it goes!

One other thought on this topic, depending on the rates of gene transfer
by homologous recombination in your study organisms, it might be
fruitful to constrain the phylogeny used by tools like MLGO to the tree
inferred from nucleotide substitution data.

Best,
-Aaron

On Tue, 2015-02-10 at 17:24 +0000, Weigand, Michael Richard
(CDC/OID/NCIRD) (CTR) wrote:
> Hello Mauve users,
> 
>  
> 
> I have aligned ~40 complete genomes from isolates of the same bacteria
> species using the progressiveMauve command-line application and the
> unannotated FASTA genome sequences. These genomes are closely related,
> but often differ in their organization owing to a number of
> rearrangement (inversions) and IS-element insertions. I’m interested
> to further analyze the .backbone file to reconstruct possible
> phylogenetic relationships between these genomes based on genome
> order/organization and content, which has lead me to the following
> questions:
> 
>  
> 
> 1.      What parameters most determine the stringency of defining
> homology between conserved segments (and what are the defaults)?
> Meaning, what dictates the cut-off between defining segments from two
> genomes as being the ‘same’ (same row of .backbone) or
> ‘different’ (different rows)?
> 
>  
> 
> 2.      Have others tried to ask similar questions with the .backbone
> output? I know there are tools for inferring rearrangement histories
> using BADGER/GRAPPA-like permutation matrices, including MLGO which
> now allows blocks shared among only a subset of genomes. And so it
> seems proper tuning of the parameters for segment homology definition
> could provide improved resolution to such phlyogenies.  
> 
>  
> 
> Thanks for any thoughts/feedback you might have.
> 
>  
> 
> ====================
> 
> Michael R. Weigand, PhD
> 
> Bioinformatician | IHRC
> 
> NCIRD/DBD/MVPDB
> 
> Centers for Disease Control and Prevention
> 
> mweig...@cdc.gov
> 
> 404.639.2473
> 
>  
> 
> 
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Mauve-users mailing list
> Mauve-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mauve-users

-- 
Aaron E. Darling, Ph.D.
Associate Professor, ithree institute
University of Technology Sydney
Australia

http://darlinglab.org
twitter: @koadman





UTS CRICOS Provider Code: 00099F
DISCLAIMER: This email message and any accompanying attachments may contain 
confidential information.
If you are not the intended recipient, do not read, use, disseminate, 
distribute or copy this message or
attachments. If you have received this message in error, please notify the 
sender immediately and delete
this message. Any views expressed in this message are those of the individual 
sender, except where the
sender expressly, and with authority, states them to be the views of the 
University of Technology Sydney.
Before opening any attachments, please check them for viruses and defects.

Think. Green. Do.

Please consider the environment before printing this email.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to