October 1, 2005

        I thank Jim Rohlf for his prompt and very interesting comment.
 He is correct, of course, that any MDS technique that
 relaxes the requirement of matching original distances must at
 the same time relax the identity of the resulting ordinations
 with PCOORD plots of the variables contributing to the original
 distances.  It is a tribute to his longevity in our field that
 the primary paper he cites is dated 1972.  At that time I was
 still a graduate student of social theory. It was years before
 I became a statistician, and in going back to pick up the
 literature I'd overlooked in a misspent youth I never encountered
 Rohlf's indeed quite pertinent paper before yesterday's post.
        But that dating of 1972 is also a useful cue for what is
 (I hope) a useful continuing comment.  1972 was long before
 the onset of useful shape coordinates -- indeed before the
 original Kendall publication of 1977 (that none of us saw
 for years) that set the Procrustes metric on the road to
 its multivariate implementation.  Is it appropriate to
 apply a technique from 1972 to shape variables invented
 in the 1990's?  What assumptions does the 1972 method
 make about the origins of the data to which it is being applied?
        Jim's comment would apply to _any_ multivariate
 sum of squares, arising from any vector-valued data matrix
 -- shape coordinates, measured distances, measured angles,
 log-distances.  It does not even assume that the metric _is_ a sum of
 squares. Like the rest of the MDS family of ordinations,
 it can work (meaning: it can produce useful ordination plots)
 when applied to 
 Manhattan metrics, string matching measures, perceived
 similarity from subjects in psychology experiments, whatever.  
 But shape coordinates are not like that.  They arise from
 an optimizing construction in the original Cartesian
 geometry of the image data, and relaxing the distance
 alters the conceptual context according to which that choice
 of distance itself came to be justified.
        Remember that the Procrustes family of techniques has
 two exactly equipotent purposes: sorting the organisms
 (the task we call ordination), but also ordinating the
 space of possible measurements (the space of "shape
 variables" dual to the coordinate space).  The underlying
 beauty of the Procrustes toolkit is its reinterpretation of
 the original Kendall geometry in a multivariate context that
 makes this duality possible.  Remember, too, that MDS
 itself arose originally in the social sciences, where the
 data to which it applied do not have any geometry of their own.
        There is no natural match between the two approaches to
 the meaning of "distances," which
 correspond to two completely different styles of statistical science.
 The relaxation of the distance metric that Jim published
 in 1972, as applied to Procrustes shape coordinates (or, for
 that matter, to the new size-shape coordinates), breaks the
 connection between those two purposes.  While the result of
 a metric MDS on shape coordinates are interpretable as
 rotations of the Procrustes shape coordinates, the axes
 of a nonmetric MDS are not interpretable as shape variables
 at all.  One arrives at an ordination, yes, wherein nearby
 specimens are more similar, are perhaps usefully clustered -- but
 the mathematical basis for the selection of the distance
 measure being _sent_ for MDS has been removed, and with
 it the possibility that the resulting coordinates have further
 formal properties independent of the name and the author of the
 computer program that generated them.
         It is not solely a matter of style or taste, then,
 but also a matter of fundamental scientific methodology, to
 ask: what is the impact upon ultimate scientific interpretation
 of this abandonment of the duality between specimens and variables
 that is woven into the foundation of the Procrustes techniques?
 The original question to which Jim and I both responded asked
 (among other things) about the correspondence of shape
 distance with geographical distance.  The Procrustes
 method allows, among other things, the coordinates of a PCOORD plot to be
 correlated with latitude and longitude, for description
 of actual gradients or clines (the approach we call
 "singular warps").  But this separate
 optimization, the PLS analysis of map coordinates
 against shape coordinates, no longer is interpretable when
 the input shape representation is a nonmetric MDS output instead of
 an exact re-projection of the initial Procrustes metric geometry.
 It is no longer talking about explaining shape covariance;
 it is, in fact, no longer talking about "explaining" anything at all
 in the sense of "explanation" that we all borrow from the world
 of linear models.
          We have prior experience in
 a similar topic, the relation of the Procrustes coordinates
 to the resistant-fit version of superposition.  Resistant fit produces 
occasionally
 interpretable diagrams (yes, Virginia, there really is a
 Pinocchio), but at considerable cost in the
 blocking of any subsequent multivariate analysis.  It didn't
 matter in the original (1982) publications of the resistant-fit
 method, but it came to matter shortly afterwards, as the
 synthesis emerged of which I am speaking, in which shape
 coordinates serve two roles, not only one.  By now, in 2005,
 I am not aware of any scientific uses of the resistant-fit
 methods -- the price is simply too high.  In my view
 the road through nonmetric MDS is likewise a biometrical
 dead-end. If there is any further interpretation of the
 resulting diagrams, it is not via statistical properties
 conveyed by the actual plotted point locations.  Those 
 properties have been dissolved by the software.  And it is in
 fact the same price: the breaking of the tie between the
 coordinates of the final diagrams and their interpretation
 as shape variables.
         Yes, the technique of nonmetric MDS, which does indeed
 downplay the weight of the most different shapes vis-a-vis those
 closer together, furthers one of the core purposes
 of numerical taxonomy, namely, cluster-based classification.
 This is at the cost of several other purposes for which no
 techniques existed in 1972 but that are now available
 _simultaneously_ in the Procrustes toolkit.  Jim appropriately 
 mentions ordination as one purpose that can be served even when the
 formal symmetries of the Procrustes approach are relaxed
 in this particular way.  But he did not go on to indicate
 as well the costs of the practice he recommended, which is
 to say, the breaking of the formal geometric tie between
 object coordinates and measureable variables that gives the toolkit as a whole
 its power.  Once this tie is broken, it can't be restored
 by any subsequent multivariate maneuver. In 1972, there
 was no such tie to worry about.  The intervening third
 of a century has given us additional multivariate power
 that should not be set aside lightly.
        But then (my final comment) this is a strange place
 to start relinquishing the Procrustes approach. If you are
 willing to relax Procrustes distance against a general
 monotone function, why on earth are you using Procrustes
 distance in the first place?  Why relax _that_ assumption,
 instead of the symmetry over landmarks and directions
 (the assumption of the offset isotropic Gaussian model)
 that is so more more obviously violated in realistic
 data sets?  The Procrustes metric is "what we mean
 by shape distance" only if you accept the stringent
 symmetries that made Kendall's original insights
 feasible.  If you are going to start altering assumptions,
 monotone transformations of a still totally symmetric
 formalism are a strange, I would say non-biological,
 place to begin.  Better to consider the original landmark
 scheme itself (something that went unmentioned and of
 course unfigured in the original post to which we are
 both responding), and ask instead the kinds of questions
 for which we _do_ have answers within the perseverating
 Procrustes framework: questions about anisotropy of landmark
 variability, about landmarks vs. semilandmarks, and the
 other tools that are compatible with the original 
 mathematics.  The nonmetric MDS, which predates the whole
 Procrustes approach, has no handles by which to pick up
 these additional tools. 
      
         I look forward to additional comments on this
 theme.  Last year my Vienna group and I published a comment on the vicissitudes
 of morphometrics, arising in numerical taxonomy but now
 centered (at least in Europe)
 in physical anthropology, evolutionary biology and evo-devo.  
 The innovation of which I'm speaking here might be
 intrinsic to this translation: the emergence of a duality between
 descriptors of specimens and descriptors of quantitative
 measures per se, a duality for which classic numerical
 taxonomy never seemed to have much use.  This is
 not a criticism of the taxonomy itself, of course, only
 a comment on the corresponding limitations of subsequent
 quantitative scientific context. 

 Fred Bookstein
 [EMAIL PROTECTED]
 
  


      
-- 
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to