Hi Aaron,
Thanks for the feedback and suggestions. I'm trying to use the progressiveMauve
multiple-genome alignment as means toward a couple different 'ends', but all of
which depend on the accuracy of identifying segments in .backbone as either
conserved or unique among a set of genome sequences. So I'm trying to think of
how I might evaluate such 'accuracy' in order to optimize the HMM parameters.
My initial plan is to compare within- vs between-segment sequence
homology/identity with a small number of genomes. The assumption is that these
genomes are very closely related, so most orthologous segments should be nearly
identical, and so segments that should be matched will show high homology. But
given the 'sloppiness' you mention, perhaps there is a limit to the sequence
length resolution that can be achieved (?) and there may need to be minimum
length cut-off.
Once optimized, I'm hoping this data can be used for a core/pan-genome
analysis, that is CDS-independent, to reveal potential clade-specific markers.
At the same time, I'm hoping these data can also then be used for inferring
changes in genome structure. From what I can tell, the permutation matrix
output by the GUI includes only LCBs, which themselves are (more or less)
'merged' segments from .backbone (is that fair?). And so it seems to me that by
optimizing homologous/unique segment definition you could use the .backbone
file directly to generate a permutation matrix with a higher number of
informative sites, which could lead to a 'higher resolution' history of the
genome. The goal here being similar to that stated above; to reveal
clade-specific changes in genome organization (both structure and content). But
as noted above, the 'sloppiness' of the pairwise HMM could be a limitation.
I appreciate your thoughts on this topic.
Mike
Date: Wed, 11 Feb 2015 11:55:42 +1100
From: Aaron Darling <aaron.darl...@uts.edu.au<mailto:aaron.darl...@uts.edu.au>>
Subject: Re: [Mauve-users] Further analysis of progressiveMauve
.backbone file
To: mauve-users@lists.sourceforge.net<mailto:mauve-users@lists.sourceforge.net>
Message-ID:
<1423616142.21100.170.camel@calumet<mailto:1423616142.21100.170.camel@calumet>>
Content-Type: text/plain; charset="utf-8"
Hi Michael, great to see some discussion on this topic.
I'll try to give some answers. The boundaries of the homologous segments are
defined by an HMM on a pairwise basis among all genomes. The process is
described in a few of the publications, e.g. Treangen et al 2009 and Darling et
al 2010. Importantly, the boundaries end up being a bit sloppy after taking the
unions of pairwise homology predictions, so the .backbone file can contain some
small pieces that may or may not actually be conserved. The command-line
accessible parameters controlling this process are --hmm-p-go-homologous,
--hmm-p-go-unrelated, --hmm-identity, and --island-gap-size. There's a few more
parameters that are currently only accessible via the source code, such as the
HMM posterior probability threshold used to call homology. I've tried to set
all these to sensible defaults, but it's likely they could benefit from fine
tuning in some cases.
In terms of genome evolution inference, it's a fascinating and challenging
problem, where there is often great uncertainty regarding the individual events
in likelihood based models. The data are generally not informative enough to
distinguish among alternative event histories, but as was demonstrated in the
Darling et al 2008 paper on yersinia genome rearrangements, common features
among alternative histories can be identified as well supported. I haven't
tried MLGO, but if you'd like to give it a go, you might be interested in the
(experimental) new feature in Mauve 2.4.0 which allows one to export a
permutation matrix via the GUI. I'm not sure the format will match what MLGO
expects, but subset blocks are represented in the matrix exported via the Mauve
GUI.
However, keep in mind that progressiveMauve and other positional homology
genome aligners are not aligning or identifying all the various homologous
regions within individual genomes, so you won't get duplication history
inference. If you try this I'd be keen to hear about how it goes!
One other thought on this topic, depending on the rates of gene transfer by
homologous recombination in your study organisms, it might be fruitful to
constrain the phylogeny used by tools like MLGO to the tree inferred from
nucleotide substitution data.
Best,
-Aaron
On Tue, 2015-02-10 at 17:24 +0000, Weigand, Michael Richard
(CDC/OID/NCIRD) (CTR) wrote:
> Hello Mauve users,
>
>
>
> I have aligned ~40 complete genomes from isolates of the same bacteria
> species using the progressiveMauve command-line application and the
> unannotated FASTA genome sequences. These genomes are closely related,
> but often differ in their organization owing to a number of
> rearrangement (inversions) and IS-element insertions. I?m interested
> to further analyze the .backbone file to reconstruct possible
> phylogenetic relationships between these genomes based on genome
> order/organization and content, which has lead me to the following
> questions:
>
>
>
> 1. What parameters most determine the stringency of defining
> homology between conserved segments (and what are the defaults)?
> Meaning, what dictates the cut-off between defining segments from two
> genomes as being the ?same? (same row of .backbone) or ?different?
> (different rows)?
>
>
>
> 2. Have others tried to ask similar questions with the .backbone
> output? I know there are tools for inferring rearrangement histories
> using BADGER/GRAPPA-like permutation matrices, including MLGO which
> now allows blocks shared among only a subset of genomes. And so it
> seems proper tuning of the parameters for segment homology definition
> could provide improved resolution to such phlyogenies.
>
>
>
> Thanks for any thoughts/feedback you might have.
>
>
>
> ====================
>
> Michael R. Weigand, PhD
>
> Bioinformatician | IHRC
>
> NCIRD/DBD/MVPDB
>
> Centers for Disease Control and Prevention
>
> mweig...@cdc.gov<mailto:mweig...@cdc.gov>
>
> 404.639.2473
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users