Hi Aaron,

Thanks for the feedback and suggestions. I'm trying to use the progressiveMauve 
multiple-genome alignment as means toward a couple different 'ends', but all of 
which depend on the accuracy of identifying segments in .backbone as either 
conserved or unique among a set of genome sequences. So I'm trying to think of 
how I might evaluate such 'accuracy' in order to optimize the HMM parameters. 
My initial plan is to compare within- vs between-segment sequence 
homology/identity with a small number of genomes. The assumption is that these 
genomes are very closely related, so most orthologous segments should be nearly 
identical, and so segments that should be matched will show high homology. But 
given the 'sloppiness' you mention, perhaps there is a limit to the sequence 
length resolution that can be achieved (?) and there may need to be minimum 
length cut-off.



Once optimized, I'm hoping this data can be used for a core/pan-genome 
analysis, that is CDS-independent, to reveal potential clade-specific markers. 
At the same time, I'm hoping these data can also then be used for inferring 
changes in genome structure. From what I can tell, the permutation matrix 
output by the GUI includes only LCBs, which themselves are (more or less) 
'merged' segments from .backbone (is that fair?). And so it seems to me that by 
optimizing homologous/unique segment definition you could use the .backbone 
file directly to generate a permutation matrix with a higher number of 
informative sites, which could lead to a 'higher resolution' history of the 
genome. The goal here being similar to that stated above; to reveal 
clade-specific changes in genome organization (both structure and content). But 
as noted above, the 'sloppiness' of the pairwise HMM could be a limitation.



I appreciate your thoughts on this topic.



Mike







Date: Wed, 11 Feb 2015 11:55:42 +1100

From: Aaron Darling <aaron.darl...@uts.edu.au<mailto:aaron.darl...@uts.edu.au>>

Subject: Re: [Mauve-users] Further analysis of progressiveMauve

                .backbone file

To: mauve-users@lists.sourceforge.net<mailto:mauve-users@lists.sourceforge.net>

Message-ID: 
<1423616142.21100.170.camel@calumet<mailto:1423616142.21100.170.camel@calumet>>

Content-Type: text/plain;  charset="utf-8"



Hi Michael, great to see some discussion on this topic.

I'll try to give some answers. The boundaries of the homologous segments are 
defined by an HMM on a pairwise basis among all genomes. The process is 
described in a few of the publications, e.g. Treangen et al 2009 and Darling et 
al 2010. Importantly, the boundaries end up being a bit sloppy after taking the 
unions of pairwise homology predictions, so the .backbone file can contain some 
small pieces that may or may not actually be conserved. The command-line 
accessible parameters controlling this process are --hmm-p-go-homologous, 
--hmm-p-go-unrelated, --hmm-identity, and --island-gap-size. There's a few more 
parameters that are currently only accessible via the source code, such as the 
HMM posterior probability threshold used to call homology. I've tried to set 
all these to sensible defaults, but it's likely they could benefit from fine 
tuning in some cases.



In terms of genome evolution inference, it's a fascinating and challenging 
problem, where there is often great uncertainty regarding the individual events 
in likelihood based models. The data are generally not informative enough to 
distinguish among alternative event histories, but as was demonstrated in the 
Darling et al 2008 paper on yersinia genome rearrangements, common features 
among alternative histories can be identified as well supported. I haven't 
tried MLGO, but if you'd like to give it a go, you might be interested in the 
(experimental) new feature in Mauve 2.4.0 which allows one to export a 
permutation matrix via the GUI. I'm not sure the format will match what MLGO 
expects, but subset blocks are represented in the matrix exported via the Mauve 
GUI.

However, keep in mind that progressiveMauve and other positional homology 
genome aligners are not aligning or identifying all the various homologous 
regions within individual genomes, so you won't get duplication history 
inference. If you try this I'd be keen to hear about how it goes!



One other thought on this topic, depending on the rates of gene transfer by 
homologous recombination in your study organisms, it might be fruitful to 
constrain the phylogeny used by tools like MLGO to the tree inferred from 
nucleotide substitution data.



Best,

-Aaron



On Tue, 2015-02-10 at 17:24 +0000, Weigand, Michael Richard

(CDC/OID/NCIRD) (CTR) wrote:

> Hello Mauve users,

>

>

>

> I have aligned ~40 complete genomes from isolates of the same bacteria

> species using the progressiveMauve command-line application and the

> unannotated FASTA genome sequences. These genomes are closely related,

> but often differ in their organization owing to a number of

> rearrangement (inversions) and IS-element insertions. I?m interested

> to further analyze the .backbone file to reconstruct possible

> phylogenetic relationships between these genomes based on genome

> order/organization and content, which has lead me to the following

> questions:

>

>

>

> 1.      What parameters most determine the stringency of defining

> homology between conserved segments (and what are the defaults)?

> Meaning, what dictates the cut-off between defining segments from two

> genomes as being the ?same? (same row of .backbone) or ?different?

> (different rows)?

>

>

>

> 2.      Have others tried to ask similar questions with the .backbone

> output? I know there are tools for inferring rearrangement histories

> using BADGER/GRAPPA-like permutation matrices, including MLGO which

> now allows blocks shared among only a subset of genomes. And so it

> seems proper tuning of the parameters for segment homology definition

> could provide improved resolution to such phlyogenies.

>

>

>

> Thanks for any thoughts/feedback you might have.

>

>

>

> ====================

>

> Michael R. Weigand, PhD

>

> Bioinformatician | IHRC

>

> NCIRD/DBD/MVPDB

>

> Centers for Disease Control and Prevention

>

> mweig...@cdc.gov<mailto:mweig...@cdc.gov>

>

> 404.639.2473

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to