April 24, 2007

 TO: the Morphmet readership
 FROM: Fred Bookstein
 RE: Mantel tests and morphometrics

 At the American Association of Physical Anthropologists congress in
 Philadelphia last month, Mantel tests of Procrustes distance against
 some other "distance" or "dissimilarity" appeared a few times in the
 course of presentations about the relation of shape to other
 sorts of biological processes (such as evolution or
 ecophenotypy).  I thought the Mantel test was
 a really poor choice in every example I saw. That suggested that its
 use was not the fault of the presenters independently but
 instead owed to some systematic misunderstanding of
 the maneuver by our community in the first place.
 An argument might then be possible that would persuade a potential
 user to abandon the idea of exploiting this test well
 prior to any inspection of data.    This is a first draft
 of that argument.

         The Mantel test (named for Nathan Mantel, 1919 - 2002,
 American epidemiologist) applies
 to data in the form of two (or more) distance matrices on the same
 sample of cases (in the original application, cases of a disease
 under study).  Nowadays it takes the form of a permutation computation
 for assessing the observed correlation of
 entries between the distance matrices.  Because distances are correlated
 across the rows (or columns) of a distance matrix, the permutation
 procedure needs to be clever: it must scramble the rows _and_
 columns together.  In Splus notation, this is something like

           junk<-sample(N); permdist<-dist[junk,junk]

 where the operator "sample(N)" returns a random permutation of the
 integers from 1 to N (sample size). A Mantel test is said to be
"statistically
 significant" (but I will argue below that this statement is logically
 incoherent) just when the correlation of the off-diagonal part of two
distance
 matrices is greater than the 0.95 fractile, or whatever you choose,
 in the distribution of the correlations of
 the off-diagonal parts of these pseudodistance matrices.

         At the AAPA meeting, presenters were applying Mantel tests to
 Procrustes distance matrices against other interspecimen or
 intersample distance or dissimilarity matrices and announcing
 either that (i) the null hypothesis was rejected, in which case the
 two distances must have had something to do with each other, or (ii)
that the
 null hypothesis was not rejected, in which case the presenter just went on
 to the next slide.

         My purpose in this note is to open a thread by arguing
 that this whole approach -- the application of a Mantel test in
connection with
 Procrustes distance -- is a bad idea, and should never be used.

         My argument will have two parts: the problem of the alternative
 with respect to which the Mantel permutation distribution is the null,
and the remarkably
 low power of the Mantel procedure itself in comparison to alternatives
 that look more like our regular biometrical hypotheses.
 Following the second there is, as is my wont, a concluding sermon.

        Historical note.  My argument here appears at first to be the
opposite
 of an argument published a few years ago by Legendre's group at McGill
 (Dutilleul, Stockwell, Frigon, and Legendre, "The Mantel Test versus
 Pearson's Correlation Analysis: Assessment of the Differences for
 Biological and Environmental Studies,"
 Journal of Agricultural, Biological, and Environmental Statistics
 5:131-150, 2000). The Dutilleul et al. simulations
 concern autocorrelated blocks X, Y consisting of only a single
 variable each, a special case unlikely to be of any direct
 relevance to geometric morphometrics [GMM] since shape coordinates come in
 twos or threes. The context of diffusion presumably accounts for that
 postulated autocorrelation structure.
 My assertion, in particular, that morphometric hypotheses
 cannot be validly stated in terms of "distance or closeness" seems
inconsistent
 with the assumption of Dutilleul et al. (p. 149) that
 "users of the Mantel test should be fully aware that the hypotheses
 tested are stated in terms of distances or closeness," but the discrepancy
 can be resolved by inferring that they
 (the users) are not doing morphometrics.  Certainly
 the authors' conclusion that "under the multivariate normal model for
 the raw data, the use of squared Euclidean distances in the Mantel
 test provides a situation in which the Mantel test and Pearson's
 correlation test agree" does not appear to extend to the multivariate
 model to be introduced under (**) here.

        1.  The Mantel test is not a "test of association" as we
biometricians usually
 use that phrase.  "Association" is between variables measured on one
 case, specimen, or group at a time, and the Mantel doesn't talk about those
 at all; it talks about distances or dissimilarities relating _two_
 cases or specimens or groups.  So, no matter what the permutation
procedure,
 it can't be thought of as a significance test.  Let me take a little
 more space to explain this in more detail.

        For a null-hypothesis significance test to make sense, there
 need to be two probabilistic models, not one, and the models have
 to generate the data (i.e. the values we observe case by case).
 In the first, "null" model, usually some
 parameter (scalar or vector) of the population is zero; in the second
model,
 it is not. In Mantel's original application, one distance was
 geographical and the other temporal.
 The alternative hypothesis was then, at least in
 principle, a pretty rigorous one: that the disease cluster under
 study was disseminated in part by diffusion.  (Models like this
 had appeared in the evolutionary literature as early as the 1940's,
 in some brilliant data analyses by Sewall Wright.)  Assuming vectors
 without jump processes (i.e. no jet travel, no
 macromutations), a diffusion process
 variance is linear in time.  There follows a fairly
 strict alternate model in the form of a regression of squared
 geographical distance on time, a regression that would have to have
 constant slope and zero intercept.  (Mantel himself may not have fit models
 like these, but Wright did, regressing phenetic distance on geographic
 distance, with time the implicit predictor accounting for both.)

        The applications at AAPA that I saw never took this form.
 In these talks there was, as best I recall, no alternate model,
 no statement about what the relationship between the two distance
 matrices should look like (i.e. how that relation between the
 two distance matrices should have arisen) had the
 permutation test given a small p-value
 for the null, and also no suggested parameterization of that relationship
 that reduced the data summary to a possible explanatory mechanism
 such as a diffusion.  Just as well, then, that the null was
 hardly ever rejected. Had it been, the presenters would
 have been at a loss as to what to say or do next.

        By comparison, the diffusion approach is not really a
 null-hypothesis test. The hypothesis -- diffusion -- is known in
 advance to be true. The task is instead the estimation of a
coefficient, namely, an
 expected increment in mean squared distance per unit elapsed
 time, or the equivalent in other domains.
 This is not a correlation of two "distances," then,
 but the calibration of a sophisticated univariate regression
 model.  Please note that diffusion models and other random-walk models
 have no equivalent either for the mean or for the covariance
 structures that we use in garden-variety multivariate
 biometrics. The formalisms deal with other kinds of
 stochastics entirely, and there isn't much interoperability
 between them.  The Procrustes method, for instance, presumes
 a mean form. That is reason enough not to put realistic diffusions on
 top of it -- in reality diffusions don't have means, and
 not velocities (vectors) but only speeds.

        2.  By virtue of the interconvertibility of Procrustes distance
 with principal coordinates, we have useful parametric alternative
 models from geometric morphometrics that permit proper statements of
 one actual hypothesis-testing context here, along with a much more
 powerful test whenever population values of things like
 means and covariances can be assumed to exist.

        Assuming that at least one distance matrix for the Mantel test is
 a Procrustes distance matrix, we convert it to the vectors of
 shape coordinates that express the same information.  (See my comment
 on that Theobald article, posted to this news group a few weeks ago.)
Then there
 are two cases, as the second matrix is or is not similarly
 convertible to principal coordinates in a similarly meaningful way.

        (2a) Both distances arise from coordinates.
 The obvious multivariate set-up is that of a multivariate
 Gaussian relationship on the underlying coordinates X, Y.
 The simplest alternate hypothesis to the Mantel null is then
 a nonzero covariance matrix C between X and Y.  For the simplest pedagogy,
 assume that this covariance is carried by only one
 of the X-coordinates vis-a-vis only one of the Y-coordinates, and
 that the X's and Y's are spherical separately.  Then the
 covariance matrix of the concatenated data vector (X | Y)
 takes the form

                               e 0 0  ... 0
                        I      0 0 0  ... 0
                                   ...
      \Sigma  =                0 0 0  ... 0                 (**)
                   e 0  ... 0
                   0 0  ... 0
                       ...           I
                   0 0  ... 0

 and we can learn a lot by simulating various tests on
 data drawn from this distribution for various values of e.
 (As I noted in my comment to Theobald, it doesn't matter
 what the mean of the distribution is -- in morphometrics,
 we hardly ever care.)  This model is the model of
 canonical correlations between the blocks with only one
 nonzero canonical correlation.

        I spent a pleasant weekend drawing samples of
 100, over and over, from this distribution, and looking
 at various test statistics for hypotheses about e.
 In my set-up, there were six coordinates in X, as
 there might be for shape coordinates on five landmarks
 in 2D, and three coordinates in Y, as for size-shape
 space of a triangle or for latitude, longitude,
 altitude of a geographical ecophenotypy study.
 The single dimension of the X-block in the first
 shape coordinate slot could be, say, the vertical uniform component
 of those shape coordinates. That for the Y's could be log
 Centroid Size (from Procrustes form
 space for a triangle) or perhaps a linear combination
 of geographical coordinates in the ecophenotypy version.

          I considered three statistics
 for testing for a nonzero e in (**):
 the first PLS singular value, the first canonical correlation
 (just because I've always wondered how those would relate), and a
 Mantel test.  On samples of 100 the 0.05 tail of the
 null distribution (values when e = 0) for either the first canonical
 correlation or the first PLS singular value was the
 median (i.e., the "0.50 tail") when e is equal to about
 0.35.  (This is probably analytically derivable from the
 usual asymptotics of the first canonical correlation, but my copy of
 Anderson isn't handy as I write this.) In other words,
 the power at 0.05 of a PLS two-block analysis on this model for
 samples of 100 is 50% just when e is around 0.35.  At that value of
 e, though, the power of the Mantel test is virtually
 zero: hardly any Mantel correlations at e = .35 rise
 above the .95 fractile for e = 0 that would render them
 "significant" in tests of that null by this formulation.

        The Mantel test doesn't actually rise to 50% power at that 0.05
level
 until e is about .67.  Compared to the e~0.35 of the PLS/cancorr
 versions, this is the requirement that nearly four
 times as much cross-block variance be explained if the test
 is to have a 50% chance of finding the signal we know
 we put there. Such low
 power should be considered fatal for any application permitting
 a proper covariance-modeled alternative.
      Note: the PLS and can. corr. tests are one-tailed, as those
 statistics are necessarily positive.  The Mantel test
 statistic, an empirical correlation, can be either
 positive or negative, and so the test takes a two-tailed form.

         This pattern is intuitively sensible.  The
 Mantel test doesn't know that the meaningful fluctuation
 of the distance-distance relationship is in only one
 direction per block, and so it must consider the sums of
 squares in all directions within both blocks.  Most of this
 is noise, by assumption, and thus completely
 uninformative for the actual problem at hand (estimating a
 value for e and deciding if it should be considered nonzero).
 The resulting permutation distribution is
 completely befogged by all this unnecessary noise,
 but the coordinate-based multivariate techniques
 can keep all this in order.

         (2b) One distance arises from Procrustes coordinates,
 but the other does not.  We simulate that in the same model (**)
 by regressing the "nondecomposable" distance (say,
 dissimilarity of predation profiles) against the appropriate
 (quadratic) term in the decomposable (Procrustes) one.
 The prediction is that squared distance should be linear
 in squared difference of the correct shape coordinate.
 For the model (**), that's the correlation of squared Euclidean distances
 DXij2 from the first block, for instance, on scalar differences
 (Y1i - Y1j)2 from the first variable in the second block.
 The correlation is again perm-tested
 over the two full 100x100 matrices involved (after deletion
 of diagonals).  Remember that 100 is the simulated sample size here,
 not the size of the matrix in (**).

        In an extensive simulation (five whole minutes of CPU time
 on my laptop) I checked permutation-test significance levels,
 still at e = 0.67, of all three approaches:
 the PLS or cancorr procedure, which suits the Gaussian model I have
 suggested; the full Mantel procedure, distance versus
 distance, which, as I've tried to explain, doesn't really suit us at all;
 and the compromise, correlating squared
 Euclidean distance within the X-block against the
 squared difference of the relevant Y coordinate (the first).
 The permutation tactic here moves both rows and columns
 just as for the regular Mantel test.

        The results are clear.  At e = 0.67, in 100 samples of 100 cases
 from the Gaussian (**), checked by 500 permutations each,
 the empirical PLS first singular value was
 exceeded only once in 50,000 permutations; the correlation
 of X-block squared Euclidean distance with squared Y1 difference
 had a tail-probability averaging 0.011; but the regular
 Mantel test had a tail-probability averaging 0.079
 for the same data.  (Need I point out that this difference
 straddles the infamous 0.05 boundary?)  For the 40
 samples out of 100 that had Mantel correlations closest
 to the target value of 0.102 (95th percentile of the null
 distribution), the mean of the Mantel permutation test
 statistic was 0.0265 (close enough to the intended
 0.05/2 value), whereas for every sample with first
 PLS singular value above 0.65 -- the 93 (out of 100) largest values
 of this statistic, which has mean 0.72 and s.d. 0.06 --
 none of 500 permutations supplied a larger interblock
 predictability, meaning that in all these simulations
 the null (** with e=0) would be rejected at p~0.002 or so.
        The tail-probability for significance testing in the
 compromise computation, correlation of DX2 with (Delta Y1)2 --
 the tail-probability that averaged .011 -- was very highly skewed,
 with 74 (out of 100 samples) values being .004 or less (0, 1, or 2
exceedances
 out of 500 permutations). This long tail is to be expected, as the
 square of a Gaussian difference has pretty high sampling
 variance.  Removing the long tails (taking Euclidean
 distances rather than squares in both blocks) leads
 to a mean tail-probability of 0.0087, which is a bit more
 satisfactory.  The tail-probabilities of the same model
 in the other direction (that is, correlation of Y-distances
 DYij with (X1i-X1j)2 over the two 100x100 matrices)
 averaged .004, roughly twice as good, perhaps because
 the number of dimensions out of which we selected one
 (the first one) in the X-block is twice the number of dimensions
 from which we selected the first from the Y-block.

         Regardless of these and other details,
 the rank-ordering of the techniques is clear. When the
 true situation is as in (**), a one-dimensional crosscovariance
 structure between two blocks of data, then for estimating
 the existence and extent of the modeled covariance, the Mantel
 test is worst, the cancorr or PLS version of (**) is
 best, and the compromise is, well, a compromise, performing
 at a level that is perhaps deducible from
 the performance of the Mantel by a Bonferroni-like consideration.

        So clearly the Mantel test should never be used in applications
 that are at all amenable to the setup of (**) (which is, to repeat,
 the most typical two-block set-up across GMM). The Mantel test is of
 unacceptably low power when both distances arise from
 principal coordinates, where the alternate hypothesis
 could take the form of some rotation of the model (**).
 But it is also of unacceptably low power when even ONE block
 arises from principal coordinates, unless it is quite unknown a-priori
 which dimension in that principal coordinate space is
 most likely to account for the cross-block prediction.  The Mantel
 procedure is thus inappropriate for use with
 Procrustes distances under most foreseeable practical GMM
 settings -- in all contexts except that of the diffusion
 model, Mantel's original setting,
 where the Gaussian set-up of (**) would be quite absurd.
 But in that context the coefficient we need to estimate is
 in units of squared distance per unit time, and it is
 the slope of a regression through the origin, not a correlation.

         3.  In my judgment, the threeway menu here --
 a true diffusion model, prior knowledge of a causally privileged
 principal coordinate, or else two full complements of
 principal coordinates -- is a reasonable categorization to impose atop
 the biometric exploration of Procrustes data.  A
 "significance test" that lacks a proper (quantitative, stochastic)
alternative
 hypothesis is not well-formed.  It cannot be reinterpreted in a
 coherent Bayesian way and cannot be thrown into the likelihood format that
 serves us so well in many other high-dimensional
 biometrical domains.  In the absence of an alternative
 hypothesis, it is not clear why the Mantel test should be
 considered a "significance test" at all.  This is no critique
 of Mantel, who had such an alternative in mind, just as Wright did.
 But the users of this test reporting their work in Philadelphia seem to
have
 had no such alternative, nor were they aware that multivariate
 analyses of the Procrustes coordinates would necessarily be
 more powerful than analysis of explicit Procrustes distances
 for this alternative had it been uttered.

        Most distance matrices can be usefully reduced
 to principal coordinates except in the context of random
 drift or diffusion, where the model (**) is false in
 presuming any sort of population covariance
 structure for modeling the variation of samples
 from the outset. My suggestion is the obvious one:
 that except for principled reductionistic isotropic
 models like diffusion, Mantel tests of two
 sets of shape coodinates be replaced by PLS, and Mantel
 tests of one Procrustes distance against a distance or dissimilarity that
 is not Procrustes-derived or otherwise Euclidean take
 the form of correlation between the nondecomposable distance/dissimilarity
 and the squared differences in the hypothesized
 direction of causal connection.  Either of these
 is far more powerful than the Mantel test for this particularly
 reasonable class of alternative biometric hypotheses.

        Putting all this another way: the applications of the Mantel
 test that I saw in Philadelphia were not really about distance-distance
 correlations -- they were really about coefficients like the single
 parameter e in my Gaussian model (**), but they were phrased
 inappropriately, and so they were tested inappropriately.  If (**) is
what you
 are actually thinking about, compute using the Procrustes coordinates,
 not just Procrustes distance, and force yourself to write down
 the single dimension of the other block that ought to be
 correlated with position in this space.  If you have principal
 coordinates or Procrustes coordinates for both blocks, use them, and
don't bother with
 the distance representation any further at all, except when
 the findings appear to be isotropic, or when there is no
 mean or covariance structure in sight.  Even in that case,
 the rejection of the Mantel null doesn't actually explain
 anything. It just blocks the null explanation, and thus permits
 you to get on with your dissertation, or your grant proposal,
 or whatever. If you want to explain something about distances,
 in the absence of a well-founded diffusion model
 you won't be able to make any progress using just distance
 representations.  Sooner or later you will need principal
 coordinates, and the sooner you recall that the Procrustes
 distances actually arose that way, the better your
 morphometrics will be.

         Preparation of this note was supported in part by grant
 P200.093/1-VI/2004 from the Austrian Council for Science and Technology to
 the Department of Anthropology, University of Vienna, Austria,
 and by the sixth European Union Framework Programme of
 Research and Technological Development under contract
 MRTN-CT-2005-019564, again to the University of Vienna.

                          Fred Bookstein

 [EMAIL PROTECTED]
 April 24, 2007




-- 
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to