Re: FW: use of partial warps in DA/CVA

morphmet Wed, 02 Feb 2005 09:56:49 -0800

> [Moderator: Made public as per Igor's suggestion. Followup 
> to...well...follow.]
...snip...


With respect to the use of partial or relative warp scores in
statistics, many common, multivariate statistical procedures are
invariant to the orthogonal axes used to record specimen observations.
For instance, assuming the use of a full set of basis vectors for your
sample variation:

Principal components analysis (PCA) - this will find the orthogonal
linear combinations of your original variables ordered by the variance
of the projections of the data onto them. The results (plots and
eigenvalues, but not eigenvector coefficients whose values are
variable-specific) will be identical (up to a reflection) regardless of
whether you use partial warps, Procrustes residuals, or any other
appropriately numbered set of orthogonal axes in the space. These
results will be, in turn, the same as an alpha=0 relative warps analysis
(which is just a PCA of partial warp scores).

Canonical variates analysis (CVA) - this is just a PCA of group means
AFTER factoring out the within-group covariance. This makes the
within-group scatter of the transformed sample statistically circular.
You'll get the same results regardles of your starting axes - you factor
out noncircular variation that is only affected as to direction by
different choices of axes, and you do a PCA on the results of that - see
(PCA).

Multivariate analysis of variance (MANOVA) - this is just a comparison
of residual and "explained" variation depending on the model - e.g.,
within- and between-group variance for tests of group mean differences.
The standard stats Wilks' lambda, Pillai's trace, etc. just use the
eigenvalues (products, sums, maximum) of within and between (explained
and unexplained) covariation, which is, like PCA invariant to the actual
baseline.

NOTE: All of the above assumes the use of either a) any complete set of
basis vectors for the space under consideration or b) a complete set for
the subspace "occupied" by your sample.

I believe the original question that started all of this was about
discriminant functions. This is a bit different. If the goal of a
discriminant function analysis is the classification of new specimens,
then it, in a sense, matters not what you do. If the goal is
classification, then the validity of whatever is done is determined
solely by its ability to correctly classify new specimens. You could use
the first few PCAs, every third partial warp, neural networks, or tarot
cards. That which does the best job is the best of what you have tried. 

If you wish to divine something about biology from the procedure, you
have a bit more of a problem. You may be able to find some
justification, perhaps even a good one, for a particular linear
combination of variables to have biological, functional, or evolutionary
meaning, but it won't be in the mathematics and having done so, you will
have defined some other variable(s) that is probably more interesting
and defensible anyway - relative body depths, spine length, etc.

The original post was concerned with the problem of small samples and
large numbers of variables. In some sense this is just "mechanical" in
that statistics programs want to compute proper inverses of covariance
matrices and cannot/will not execute if said covariance matrix is
singular. Small n is a problem for any large suite of variables, but
Procrustes residuals are guaranteed to have this problem regardless of
the number of specimens used - degrees of freedom are lost due to the
superimposition so that the superimposed data set occupies only a
subspace of that of the original data and all covariance matrices
computed in this space from the full compliment of residuals will be
singular. The use of partial warp scores addresses this issue in that
they, plus the uniform components, are of proper dimension. In the
latter case, sample size then becomes the (somewhat) critical again.

To address the mechanical problems of computing discriminant functions
or anything else that wants to invert covariance matrices there are a
several things one can do. First, and most simple, is to do a PCA and
proceed with scores on PCs with non-zero eigenvalues. This guarantees
that the space in which you are working will be at least minimally
filled. If you are doing something using within-group covariance, then
you must use only the number of PCs one less that your smallest sample -
an occasionally very restrictive requirement. The other alternative is
to "roll your own" and copy out of the statistical texts the relevant
formulae and replace things like S^(-1) with S^(-) or S^(G), i.e.,
substitute the use of a generalized inverse for the proper inverse shown
in most formulae. A good generalized inverse can be constructed from the
SVD of S, where you replace the nonzero singular values with their
inverses and reconstitute the matrix using onlyt the vectors associated
with the nonzero singular values. For other cases like MANOVA, you can
do randomization tests comparing traces of within or between broup SS
with a large number of samples whose association with the relevant
variables has been randomized.

Another comment - even if you have enormous samples far exceeding the
number of variables and these variables have not been constrained by
forces like a superimposition operation, covariance matrices can still
be singular. This is because organisms are not random variables with
unconstrained covariance. High correlations amongst variables (that is
at least partially required to produce recognizablly similar organisms)
can isolate sample variation to a small subspace of variable space. That
is, after all, the secret to the utility and efficacy of PCA.

And a final comment on PCAs. I find it generally underappreciated (even
unknown) amongst biologists that if any of the eigenvalues are
identical, then the set of eigenvectors associated with that set are not
unique. That is, they are just constructed bases of that subspace of
"round" variation and freely rotatable. As it happens, rounding error
and such seldom produce identical numerical values, so the resulting PCs
may look unique, but they are so only due to the rounding error. This
situation can occur at any scale of variation from the first few PCs to
the last or anywhere in between.

Oh my, I have gone on and written enough that at least something must
surely be wrong. I am confident that others will step in and point out
my errors.

Best, dslice
-- 
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Re: FW: use of partial warps in DA/CVA

Reply via email to