April 24, 2007
TO: the Morphmet readership
FROM: Fred Bookstein
RE: Mantel tests and morphometrics
At the American Association of Physical Anthropologists congress in
Philadelphia last month, Mantel tests of Procrustes distance against
some other "distance" or "dissimilarity" appeared a few times in the
course of presentations about the relation of shape to other
sorts of biological processes (such as evolution or
ecophenotypy). I thought the Mantel test was
a really poor choice in every example I saw. That suggested that its
use was not the fault of the presenters independently but
instead owed to some systematic misunderstanding of
the maneuver by our community in the first place.
An argument might then be possible that would persuade a potential
user to abandon the idea of exploiting this test well
prior to any inspection of data. This is a first draft
of that argument.
The Mantel test (named for Nathan Mantel, 1919 - 2002,
American epidemiologist) applies
to data in the form of two (or more) distance matrices on the same
sample of cases (in the original application, cases of a disease
under study). Nowadays it takes the form of a permutation computation
for assessing the observed correlation of
entries between the distance matrices. Because distances are correlated
across the rows (or columns) of a distance matrix, the permutation
procedure needs to be clever: it must scramble the rows _and_
columns together. In Splus notation, this is something like
junk<-sample(N); permdist<-dist[junk,junk]
where the operator "sample(N)" returns a random permutation of the
integers from 1 to N (sample size). A Mantel test is said to be
"statistically
significant" (but I will argue below that this statement is logically
incoherent) just when the correlation of the off-diagonal part of two
distance
matrices is greater than the 0.95 fractile, or whatever you choose,
in the distribution of the correlations of
the off-diagonal parts of these pseudodistance matrices.
At the AAPA meeting, presenters were applying Mantel tests to
Procrustes distance matrices against other interspecimen or
intersample distance or dissimilarity matrices and announcing
either that (i) the null hypothesis was rejected, in which case the
two distances must have had something to do with each other, or (ii)
that the
null hypothesis was not rejected, in which case the presenter just went on
to the next slide.
My purpose in this note is to open a thread by arguing
that this whole approach -- the application of a Mantel test in
connection with
Procrustes distance -- is a bad idea, and should never be used.
My argument will have two parts: the problem of the alternative
with respect to which the Mantel permutation distribution is the null,
and the remarkably
low power of the Mantel procedure itself in comparison to alternatives
that look more like our regular biometrical hypotheses.
Following the second there is, as is my wont, a concluding sermon.
Historical note. My argument here appears at first to be the
opposite
of an argument published a few years ago by Legendre's group at McGill
(Dutilleul, Stockwell, Frigon, and Legendre, "The Mantel Test versus
Pearson's Correlation Analysis: Assessment of the Differences for
Biological and Environmental Studies,"
Journal of Agricultural, Biological, and Environmental Statistics
5:131-150, 2000). The Dutilleul et al. simulations
concern autocorrelated blocks X, Y consisting of only a single
variable each, a special case unlikely to be of any direct
relevance to geometric morphometrics [GMM] since shape coordinates come in
twos or threes. The context of diffusion presumably accounts for that
postulated autocorrelation structure.
My assertion, in particular, that morphometric hypotheses
cannot be validly stated in terms of "distance or closeness" seems
inconsistent
with the assumption of Dutilleul et al. (p. 149) that
"users of the Mantel test should be fully aware that the hypotheses
tested are stated in terms of distances or closeness," but the discrepancy
can be resolved by inferring that they
(the users) are not doing morphometrics. Certainly
the authors' conclusion that "under the multivariate normal model for
the raw data, the use of squared Euclidean distances in the Mantel
test provides a situation in which the Mantel test and Pearson's
correlation test agree" does not appear to extend to the multivariate
model to be introduced under (**) here.
1. The Mantel test is not a "test of association" as we
biometricians usually
use that phrase. "Association" is between variables measured on one
case, specimen, or group at a time, and the Mantel doesn't talk about those
at all; it talks about distances or dissimilarities relating _two_
cases or specimens or groups. So, no matter what the permutation
procedure,
it can't be thought of as a significance test. Let me take a little
more space to explain this in more detail.
For a null-hypothesis significance test to make sense, there
need to be two probabilistic models, not one, and the models have
to generate the data (i.e. the values we observe case by case).
In the first, "null" model, usually some
parameter (scalar or vector) of the population is zero; in the second
model,
it is not. In Mantel's original application, one distance was
geographical and the other temporal.
The alternative hypothesis was then, at least in
principle, a pretty rigorous one: that the disease cluster under
study was disseminated in part by diffusion. (Models like this
had appeared in the evolutionary literature as early as the 1940's,
in some brilliant data analyses by Sewall Wright.) Assuming vectors
without jump processes (i.e. no jet travel, no
macromutations), a diffusion process
variance is linear in time. There follows a fairly
strict alternate model in the form of a regression of squared
geographical distance on time, a regression that would have to have
constant slope and zero intercept. (Mantel himself may not have fit models
like these, but Wright did, regressing phenetic distance on geographic
distance, with time the implicit predictor accounting for both.)
The applications at AAPA that I saw never took this form.
In these talks there was, as best I recall, no alternate model,
no statement about what the relationship between the two distance
matrices should look like (i.e. how that relation between the
two distance matrices should have arisen) had the
permutation test given a small p-value
for the null, and also no suggested parameterization of that relationship
that reduced the data summary to a possible explanatory mechanism
such as a diffusion. Just as well, then, that the null was
hardly ever rejected. Had it been, the presenters would
have been at a loss as to what to say or do next.
By comparison, the diffusion approach is not really a
null-hypothesis test. The hypothesis -- diffusion -- is known in
advance to be true. The task is instead the estimation of a
coefficient, namely, an
expected increment in mean squared distance per unit elapsed
time, or the equivalent in other domains.
This is not a correlation of two "distances," then,
but the calibration of a sophisticated univariate regression
model. Please note that diffusion models and other random-walk models
have no equivalent either for the mean or for the covariance
structures that we use in garden-variety multivariate
biometrics. The formalisms deal with other kinds of
stochastics entirely, and there isn't much interoperability
between them. The Procrustes method, for instance, presumes
a mean form. That is reason enough not to put realistic diffusions on
top of it -- in reality diffusions don't have means, and
not velocities (vectors) but only speeds.
2. By virtue of the interconvertibility of Procrustes distance
with principal coordinates, we have useful parametric alternative
models from geometric morphometrics that permit proper statements of
one actual hypothesis-testing context here, along with a much more
powerful test whenever population values of things like
means and covariances can be assumed to exist.
Assuming that at least one distance matrix for the Mantel test is
a Procrustes distance matrix, we convert it to the vectors of
shape coordinates that express the same information. (See my comment
on that Theobald article, posted to this news group a few weeks ago.)
Then there
are two cases, as the second matrix is or is not similarly
convertible to principal coordinates in a similarly meaningful way.
(2a) Both distances arise from coordinates.
The obvious multivariate set-up is that of a multivariate
Gaussian relationship on the underlying coordinates X, Y.
The simplest alternate hypothesis to the Mantel null is then
a nonzero covariance matrix C between X and Y. For the simplest pedagogy,
assume that this covariance is carried by only one
of the X-coordinates vis-a-vis only one of the Y-coordinates, and
that the X's and Y's are spherical separately. Then the
covariance matrix of the concatenated data vector (X | Y)
takes the form
e 0 0 ... 0
I 0 0 0 ... 0
...
\Sigma = 0 0 0 ... 0 (**)
e 0 ... 0
0 0 ... 0
... I
0 0 ... 0
and we can learn a lot by simulating various tests on
data drawn from this distribution for various values of e.
(As I noted in my comment to Theobald, it doesn't matter
what the mean of the distribution is -- in morphometrics,
we hardly ever care.) This model is the model of
canonical correlations between the blocks with only one
nonzero canonical correlation.
I spent a pleasant weekend drawing samples of
100, over and over, from this distribution, and looking
at various test statistics for hypotheses about e.
In my set-up, there were six coordinates in X, as
there might be for shape coordinates on five landmarks
in 2D, and three coordinates in Y, as for size-shape
space of a triangle or for latitude, longitude,
altitude of a geographical ecophenotypy study.
The single dimension of the X-block in the first
shape coordinate slot could be, say, the vertical uniform component
of those shape coordinates. That for the Y's could be log
Centroid Size (from Procrustes form
space for a triangle) or perhaps a linear combination
of geographical coordinates in the ecophenotypy version.
I considered three statistics
for testing for a nonzero e in (**):
the first PLS singular value, the first canonical correlation
(just because I've always wondered how those would relate), and a
Mantel test. On samples of 100 the 0.05 tail of the
null distribution (values when e = 0) for either the first canonical
correlation or the first PLS singular value was the
median (i.e., the "0.50 tail") when e is equal to about
0.35. (This is probably analytically derivable from the
usual asymptotics of the first canonical correlation, but my copy of
Anderson isn't handy as I write this.) In other words,
the power at 0.05 of a PLS two-block analysis on this model for
samples of 100 is 50% just when e is around 0.35. At that value of
e, though, the power of the Mantel test is virtually
zero: hardly any Mantel correlations at e = .35 rise
above the .95 fractile for e = 0 that would render them
"significant" in tests of that null by this formulation.
The Mantel test doesn't actually rise to 50% power at that 0.05
level
until e is about .67. Compared to the e~0.35 of the PLS/cancorr
versions, this is the requirement that nearly four
times as much cross-block variance be explained if the test
is to have a 50% chance of finding the signal we know
we put there. Such low
power should be considered fatal for any application permitting
a proper covariance-modeled alternative.
Note: the PLS and can. corr. tests are one-tailed, as those
statistics are necessarily positive. The Mantel test
statistic, an empirical correlation, can be either
positive or negative, and so the test takes a two-tailed form.
This pattern is intuitively sensible. The
Mantel test doesn't know that the meaningful fluctuation
of the distance-distance relationship is in only one
direction per block, and so it must consider the sums of
squares in all directions within both blocks. Most of this
is noise, by assumption, and thus completely
uninformative for the actual problem at hand (estimating a
value for e and deciding if it should be considered nonzero).
The resulting permutation distribution is
completely befogged by all this unnecessary noise,
but the coordinate-based multivariate techniques
can keep all this in order.
(2b) One distance arises from Procrustes coordinates,
but the other does not. We simulate that in the same model (**)
by regressing the "nondecomposable" distance (say,
dissimilarity of predation profiles) against the appropriate
(quadratic) term in the decomposable (Procrustes) one.
The prediction is that squared distance should be linear
in squared difference of the correct shape coordinate.
For the model (**), that's the correlation of squared Euclidean distances
DXij2 from the first block, for instance, on scalar differences
(Y1i - Y1j)2 from the first variable in the second block.
The correlation is again perm-tested
over the two full 100x100 matrices involved (after deletion
of diagonals). Remember that 100 is the simulated sample size here,
not the size of the matrix in (**).
In an extensive simulation (five whole minutes of CPU time
on my laptop) I checked permutation-test significance levels,
still at e = 0.67, of all three approaches:
the PLS or cancorr procedure, which suits the Gaussian model I have
suggested; the full Mantel procedure, distance versus
distance, which, as I've tried to explain, doesn't really suit us at all;
and the compromise, correlating squared
Euclidean distance within the X-block against the
squared difference of the relevant Y coordinate (the first).
The permutation tactic here moves both rows and columns
just as for the regular Mantel test.
The results are clear. At e = 0.67, in 100 samples of 100 cases
from the Gaussian (**), checked by 500 permutations each,
the empirical PLS first singular value was
exceeded only once in 50,000 permutations; the correlation
of X-block squared Euclidean distance with squared Y1 difference
had a tail-probability averaging 0.011; but the regular
Mantel test had a tail-probability averaging 0.079
for the same data. (Need I point out that this difference
straddles the infamous 0.05 boundary?) For the 40
samples out of 100 that had Mantel correlations closest
to the target value of 0.102 (95th percentile of the null
distribution), the mean of the Mantel permutation test
statistic was 0.0265 (close enough to the intended
0.05/2 value), whereas for every sample with first
PLS singular value above 0.65 -- the 93 (out of 100) largest values
of this statistic, which has mean 0.72 and s.d. 0.06 --
none of 500 permutations supplied a larger interblock
predictability, meaning that in all these simulations
the null (** with e=0) would be rejected at p~0.002 or so.
The tail-probability for significance testing in the
compromise computation, correlation of DX2 with (Delta Y1)2 --
the tail-probability that averaged .011 -- was very highly skewed,
with 74 (out of 100 samples) values being .004 or less (0, 1, or 2
exceedances
out of 500 permutations). This long tail is to be expected, as the
square of a Gaussian difference has pretty high sampling
variance. Removing the long tails (taking Euclidean
distances rather than squares in both blocks) leads
to a mean tail-probability of 0.0087, which is a bit more
satisfactory. The tail-probabilities of the same model
in the other direction (that is, correlation of Y-distances
DYij with (X1i-X1j)2 over the two 100x100 matrices)
averaged .004, roughly twice as good, perhaps because
the number of dimensions out of which we selected one
(the first one) in the X-block is twice the number of dimensions
from which we selected the first from the Y-block.
Regardless of these and other details,
the rank-ordering of the techniques is clear. When the
true situation is as in (**), a one-dimensional crosscovariance
structure between two blocks of data, then for estimating
the existence and extent of the modeled covariance, the Mantel
test is worst, the cancorr or PLS version of (**) is
best, and the compromise is, well, a compromise, performing
at a level that is perhaps deducible from
the performance of the Mantel by a Bonferroni-like consideration.
So clearly the Mantel test should never be used in applications
that are at all amenable to the setup of (**) (which is, to repeat,
the most typical two-block set-up across GMM). The Mantel test is of
unacceptably low power when both distances arise from
principal coordinates, where the alternate hypothesis
could take the form of some rotation of the model (**).
But it is also of unacceptably low power when even ONE block
arises from principal coordinates, unless it is quite unknown a-priori
which dimension in that principal coordinate space is
most likely to account for the cross-block prediction. The Mantel
procedure is thus inappropriate for use with
Procrustes distances under most foreseeable practical GMM
settings -- in all contexts except that of the diffusion
model, Mantel's original setting,
where the Gaussian set-up of (**) would be quite absurd.
But in that context the coefficient we need to estimate is
in units of squared distance per unit time, and it is
the slope of a regression through the origin, not a correlation.
3. In my judgment, the threeway menu here --
a true diffusion model, prior knowledge of a causally privileged
principal coordinate, or else two full complements of
principal coordinates -- is a reasonable categorization to impose atop
the biometric exploration of Procrustes data. A
"significance test" that lacks a proper (quantitative, stochastic)
alternative
hypothesis is not well-formed. It cannot be reinterpreted in a
coherent Bayesian way and cannot be thrown into the likelihood format that
serves us so well in many other high-dimensional
biometrical domains. In the absence of an alternative
hypothesis, it is not clear why the Mantel test should be
considered a "significance test" at all. This is no critique
of Mantel, who had such an alternative in mind, just as Wright did.
But the users of this test reporting their work in Philadelphia seem to
have
had no such alternative, nor were they aware that multivariate
analyses of the Procrustes coordinates would necessarily be
more powerful than analysis of explicit Procrustes distances
for this alternative had it been uttered.
Most distance matrices can be usefully reduced
to principal coordinates except in the context of random
drift or diffusion, where the model (**) is false in
presuming any sort of population covariance
structure for modeling the variation of samples
from the outset. My suggestion is the obvious one:
that except for principled reductionistic isotropic
models like diffusion, Mantel tests of two
sets of shape coodinates be replaced by PLS, and Mantel
tests of one Procrustes distance against a distance or dissimilarity that
is not Procrustes-derived or otherwise Euclidean take
the form of correlation between the nondecomposable distance/dissimilarity
and the squared differences in the hypothesized
direction of causal connection. Either of these
is far more powerful than the Mantel test for this particularly
reasonable class of alternative biometric hypotheses.
Putting all this another way: the applications of the Mantel
test that I saw in Philadelphia were not really about distance-distance
correlations -- they were really about coefficients like the single
parameter e in my Gaussian model (**), but they were phrased
inappropriately, and so they were tested inappropriately. If (**) is
what you
are actually thinking about, compute using the Procrustes coordinates,
not just Procrustes distance, and force yourself to write down
the single dimension of the other block that ought to be
correlated with position in this space. If you have principal
coordinates or Procrustes coordinates for both blocks, use them, and
don't bother with
the distance representation any further at all, except when
the findings appear to be isotropic, or when there is no
mean or covariance structure in sight. Even in that case,
the rejection of the Mantel null doesn't actually explain
anything. It just blocks the null explanation, and thus permits
you to get on with your dissertation, or your grant proposal,
or whatever. If you want to explain something about distances,
in the absence of a well-founded diffusion model
you won't be able to make any progress using just distance
representations. Sooner or later you will need principal
coordinates, and the sooner you recall that the Procrustes
distances actually arose that way, the better your
morphometrics will be.
Preparation of this note was supported in part by grant
P200.093/1-VI/2004 from the Austrian Council for Science and Technology to
the Department of Anthropology, University of Vienna, Austria,
and by the sixth European Union Framework Programme of
Research and Technological Development under contract
MRTN-CT-2005-019564, again to the University of Vienna.
Fred Bookstein
[EMAIL PROTECTED]
April 24, 2007
--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org