Hi Michael, great to see some discussion on this topic. I'll try to give some answers. The boundaries of the homologous segments are defined by an HMM on a pairwise basis among all genomes. The process is described in a few of the publications, e.g. Treangen et al 2009 and Darling et al 2010. Importantly, the boundaries end up being a bit sloppy after taking the unions of pairwise homology predictions, so the .backbone file can contain some small pieces that may or may not actually be conserved. The command-line accessible parameters controlling this process are --hmm-p-go-homologous, --hmm-p-go-unrelated, --hmm-identity, and --island-gap-size. There's a few more parameters that are currently only accessible via the source code, such as the HMM posterior probability threshold used to call homology. I've tried to set all these to sensible defaults, but it's likely they could benefit from fine tuning in some cases.
In terms of genome evolution inference, it's a fascinating and challenging problem, where there is often great uncertainty regarding the individual events in likelihood based models. The data are generally not informative enough to distinguish among alternative event histories, but as was demonstrated in the Darling et al 2008 paper on yersinia genome rearrangements, common features among alternative histories can be identified as well supported. I haven't tried MLGO, but if you'd like to give it a go, you might be interested in the (experimental) new feature in Mauve 2.4.0 which allows one to export a permutation matrix via the GUI. I'm not sure the format will match what MLGO expects, but subset blocks are represented in the matrix exported via the Mauve GUI. However, keep in mind that progressiveMauve and other positional homology genome aligners are not aligning or identifying all the various homologous regions within individual genomes, so you won't get duplication history inference. If you try this I'd be keen to hear about how it goes! One other thought on this topic, depending on the rates of gene transfer by homologous recombination in your study organisms, it might be fruitful to constrain the phylogeny used by tools like MLGO to the tree inferred from nucleotide substitution data. Best, -Aaron On Tue, 2015-02-10 at 17:24 +0000, Weigand, Michael Richard (CDC/OID/NCIRD) (CTR) wrote: > Hello Mauve users, > > > > I have aligned ~40 complete genomes from isolates of the same bacteria > species using the progressiveMauve command-line application and the > unannotated FASTA genome sequences. These genomes are closely related, > but often differ in their organization owing to a number of > rearrangement (inversions) and IS-element insertions. I’m interested > to further analyze the .backbone file to reconstruct possible > phylogenetic relationships between these genomes based on genome > order/organization and content, which has lead me to the following > questions: > > > > 1. What parameters most determine the stringency of defining > homology between conserved segments (and what are the defaults)? > Meaning, what dictates the cut-off between defining segments from two > genomes as being the ‘same’ (same row of .backbone) or > ‘different’ (different rows)? > > > > 2. Have others tried to ask similar questions with the .backbone > output? I know there are tools for inferring rearrangement histories > using BADGER/GRAPPA-like permutation matrices, including MLGO which > now allows blocks shared among only a subset of genomes. And so it > seems proper tuning of the parameters for segment homology definition > could provide improved resolution to such phlyogenies. > > > > Thanks for any thoughts/feedback you might have. > > > > ==================== > > Michael R. Weigand, PhD > > Bioinformatician | IHRC > > NCIRD/DBD/MVPDB > > Centers for Disease Control and Prevention > > mweig...@cdc.gov > > 404.639.2473 > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Mauve-users mailing list > Mauve-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mauve-users -- Aaron E. Darling, Ph.D. Associate Professor, ithree institute University of Technology Sydney Australia http://darlinglab.org twitter: @koadman UTS CRICOS Provider Code: 00099F DISCLAIMER: This email message and any accompanying attachments may contain confidential information. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. Any views expressed in this message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of the University of Technology Sydney. Before opening any attachments, please check them for viruses and defects. Think. Green. Do. Please consider the environment before printing this email. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Mauve-users mailing list Mauve-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mauve-users