[The following was submitted by Dr. Fred L. Bookstein in response to the query posted by Dr. Thomas M. Greiner. It follows with some requested editing and reformatting by me. -the morphmet moderator (dslice)]
Procrustes distance is one of a large family of occasionally reasonable measures that permit a sorting of all pairs of specimens from your original data set in decreasing order of an intuitively sensible kind of "similarity." The Proc. dist. corresponds to a geometry according to which each shape is a point in some curving manifold and the intuition about similarity comes from the claim that one is free to move a specimen, resize a specimen, or rotate a specimen in any way whatsoever without any effect on that subjective "similarity," which otherwise should come as close as possible to ordinary Euclidean distance (but why?). But whether or not that Procrustes distance is a defensible idea, most of our biometric statistical methods don't work with distances, which is to say that they don't work on curving manifolds, anyway. Instead they work only in linear spaces that require data sets to be expressed as vectors of variable values, not matrices of interspecimen distances. The tangent spaces you're asking about are exactly the spaces that do this translating. You need one of these spaces if you are going to look at correlations between shape and its causes or effects, or if you are trying to look at modularity or integration of form, symmetry and asymmetry, growth-gradients or other geometric descriptors of factors of form, reconstruction forms or estimation of missing data, and the like. You don't need this space for significance tests of distances (similarities) across groups, and there is nothing either "conservative" or "liberal" about it. The appropriate questions of that sort deal with the selection of the Procrustes metric itself, not the conversion to vectors. You are "conservative" if you think that a similarity measure must have certain mathematical properties; you are "liberal" if you think the scientist is free to measure any way he or she wishes. The particular tangent space you are wrestling with about is, by theorem, the (unique) linear space that supplies the best approximation to the original Procrustes distances near the Procrustes average form. If you don't need it, don't use it. For instance, if all you're doing is testing differences of group average shape, you don't need the tangent space; if you want to see a group average shape, you don't need it (although it gets the right answer); if you want to just look at two average shapes and write an essay, you don't need it. But if you want to see a quantitative dissection of the relation between two shapes, or two average shapes, or see how something correlates with a shape difference, or visualize a principal coordinate ( = principal component = relative warp) of shape, you do need it. The choice of tangent space vs. shape metric is driven by the nature of the question and the sense of "similarity" that the scientist is using, not by the data per se and certainly not by the statistics of the data. The language of Mahalanobis D's has no relation to this framework -- it conveys answers to questions about statistical distributions of vectors, not about similarities -- nor does the language of statistical significance tests. You need the tangent space for ordination, and you need it for quantitative description of differences beyond the ineffable "similarity" that goes into those distance matrices. So the question is most emphatically NOT "when is shape space projection required or when is the tangent space projection sufficient?" but "when is analysis of my arbitrary 'dissimilarity' good enough, and when do I need vectors instead?". If distances are enough, you don't ever need a projection, but of course you then need to argue (1) why it is sufficient just to talk about dissimilarities, and (2) why the Procrustes distance is the one you should be using. (This is a difficult argument to win.) If you need vectors, you have to get them from some metric space using some projection. The Procrustes tangent space is the one in which vector sums-of-squares come closest to minimum Euclidean sums-of-squares in the (infinitesimal) vicinity of the sample average shape (which can be determined without any mention of tangent space), and it generates a great number of geometrically valid diagrams (uniform shape changes, relative warps, semilandmarks, thin-plate splines, etc.) that often extend the scientist's original intuition of what shape similarity was supposed to entail. In short, if you only care about significance tests, you have no use for tangent spaces, only for distance matrices. If you want to understand biological processes, you can't really make any progress just by talking about similarities; you need the language of vector spaces, and the tangent space to the Kendall shape manifold is the optimal vector space for most of these purposes. Fred Bookstein -- Replies will be sent to the list. For more information visit http://www.morphometrics.org
