October 7, 2005
This is a response to Sandhya's query of this morning
about PLS (partial least squares).
Unfortunately, there are two distinct (and nearly incompatible)
meanings for this phrase.
In morphometrics I am responsible for one of them,
I think, which centers around the interpretation of the
singular values of a cross-block covariance matrix as the
covariances of the linear combinations that use the singular
vectors as coefficients. There are at least four different
and logically equivalent interpretations of the algebra of this
approach. For a recent review, see Sampson and Bookstein,
"Partial least squares," in B. Everitt, D. Howell, and
C. Lunneborg, eds., {\sl Encyclopedia of Behavioral
Statistics}, 2005. This little article includes many
warnings, mainly that you should use PLS only when you
have fairly strong prior knowledge that the underlying
notion of a "factor regression" is factually true (meaning,
in practice, that your cross-block covariance matrix is
very close to one of quite low rank, 1 or 2).
Note also the venue of the publication: behavioral statistics.
We usually have much more powerful models in connection
with real biometric data. Because the SVD is essentially
a least-squares approach to a covariance matrix, this
flavor of PLS is mainly a guide to interpretation and
sorting of variables within lists, rather than affording
any insight into their values in individual cases.
Another approach, alas also called PLS, was introduced by
Svante Wold, a chemometrician, in the 1970's,
and then the name was re-used for a structural
equations algorithm (previously, and most unfortunately,
named "NIPALS," "nonlinear iterative partial least squares")
put forward by his father, the Swedish econometrician
Herman Wold, at about the same time. Both of these are for
the purpose of regression (i.e. prediction of variables), but
only Svante's, in my view, has a rigorous algebraic setting.
It is one member in the family of extended regression techniques
including Total Least Squares that, in essence, try to minimize
some combination of prediction error and uncertainty about
coefficients at the same time. I am aware of no
distributional models under which this PLS gets "the right
answer" for the predicted value -- the search for such
a model preoccupied several of us during the 1970's, but
ended up mainly a waste of time -- but it is often considered
useful in applications that lack theory. By comparison, the least-squares
exegesis of covariance structures that underlies "my" PLS is
is typically more important for interpretation than for prediction.
But it is not obvious to me what either of these has to
do with the task of missing landmark estimation. As far
as I know, there are just two principled approaches here:
maximum-likelihood and minimum-bending. PLS is not
a maximum-likelihood (i.e. scientifically coherent)
method for quantifying anything, as far as I am aware
(every version I've ever seen is merely least-squares
in something, without any role for actual knowledge), and
so it won't be equivalent to the canonical EM methods that
maximize the posterior probability of the completed shape coordinate
distribution as a whole. Neither does PLS have any way to
use the geometric theorems about bending energy that drive
the elegant smoothing properties afforded by a thin-plate
spline interpolation from a mean form in the same context.
Instead, any application of PLS that I can
imagine would involve many purely ad-hoc assumptions
(cutoffs of dimensionality, weights, etc.), and any regular
reader of this email group already knows my preferences about
ad-hoc versus theorem-based methods in this or indeed any context.
So I am curious as to whether anyone has earlier claimed in
print (or in public email postings like these) that PLS
_does_ have something to do with estimating missing data
in morphometrics, and, if so, what the claim was, and
what was the justification. We don't have good models
of how data _go_ missing in morphometrics -- this isn't
like nonresponse in survey research -- and so it's not
clear what "merits" an estimator is supposed to have.
No computational algorithm for scientific application
can be properly assigned
either merits or demerits outside a specific pair
of true models, signal and noise, in the context of which it is supposed
to afford scientific insight. Tell us what a missing-data estimation
is supposed to mean, scientifically -- give us a model
for the signal, and (this is mandatory) also for the
noise (including both what goes missing and what is variable
within the part that is not missing) -- and we applied
statisticians can tell you (sometimes) about algorithms
that can be proved to work properly under some circumstances.
Those circumstances become the assumptions you must make,
to which the rest of us will invariably raise objections.
Best regards for your project.
Fred Bookstein
[EMAIL PROTECTED]
--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org