October 7, 2005
 
      This is a response to Sandhya's query of this morning
 about PLS (partial least squares).  

      Unfortunately, there are two distinct (and nearly incompatible)
 meanings for this phrase. 

       In morphometrics I am responsible for one of them,
 I think, which centers around the interpretation of the
 singular values of a cross-block covariance matrix as the
 covariances of the linear combinations that use the singular
 vectors as coefficients.  There are at least four different
 and logically equivalent interpretations of the algebra of this
 approach. For a recent review, see Sampson and Bookstein,
 "Partial least squares," in B. Everitt, D. Howell, and
 C. Lunneborg, eds., {\sl Encyclopedia of Behavioral
 Statistics}, 2005.  This little article includes many
 warnings, mainly that you should use PLS only when you
 have fairly strong prior knowledge that the underlying
 notion of a "factor regression" is factually true (meaning,
 in practice, that your cross-block covariance matrix is
 very close to one of quite low rank, 1 or 2).
 Note also the venue of the publication: behavioral statistics.
 We usually have much more powerful models in connection
 with real biometric data. Because the SVD is essentially 
 a least-squares approach to a covariance matrix, this
 flavor of PLS is mainly a guide to interpretation and
 sorting of variables within lists, rather than affording
 any insight into their values in individual cases.

      Another approach, alas also called PLS, was introduced by
 Svante Wold, a chemometrician, in the 1970's,
 and then the name was re-used for a structural
 equations algorithm (previously, and most unfortunately,
 named "NIPALS," "nonlinear iterative partial least squares")
 put forward by his father, the Swedish econometrician
 Herman Wold, at about the same time.  Both of these are for
 the purpose of regression (i.e. prediction of variables), but
 only Svante's, in my view, has a rigorous algebraic setting.
 It is one member in the family of extended regression techniques
 including Total Least Squares that, in essence, try to minimize
 some combination of prediction error and uncertainty about
 coefficients at the same time.  I am aware of no
 distributional models under which this PLS gets "the right
 answer" for the predicted value -- the search for such
 a model preoccupied several of us during the 1970's, but
 ended up mainly a waste of time -- but it is often considered
 useful in applications that lack theory. By comparison, the least-squares
 exegesis of covariance structures that underlies "my" PLS is
 is typically more important for interpretation than for prediction.

      But it is not obvious to me what either of these has to
 do with the task of missing landmark estimation. As far
 as I know, there are just two principled approaches here:
 maximum-likelihood and minimum-bending.  PLS is not
 a maximum-likelihood (i.e. scientifically coherent)
 method for quantifying anything, as far as I am aware
 (every version I've ever seen is merely least-squares
 in something, without any role for actual knowledge), and
 so it won't be equivalent to the canonical EM methods that
 maximize the posterior probability of the completed shape coordinate 
 distribution as a whole.  Neither does PLS have any way to
 use the geometric theorems about bending energy that drive
 the elegant smoothing properties afforded by a thin-plate
 spline interpolation from a mean form in the same context. 
 Instead, any application of PLS that I can
 imagine would involve many purely ad-hoc assumptions
 (cutoffs of dimensionality, weights, etc.), and any regular
 reader of this email group already knows my preferences about
 ad-hoc versus theorem-based methods in this or indeed any context.
      
       So I am curious as to whether anyone has earlier claimed in
 print (or in public email postings like these) that PLS
 _does_ have something to do with estimating missing data
 in morphometrics, and, if so, what the claim was, and
 what was the justification.  We don't have good models
 of how data _go_ missing in morphometrics -- this isn't
 like nonresponse in survey research -- and so it's not
 clear what "merits" an estimator is supposed to have.
 No computational algorithm for scientific application
 can be properly assigned
 either merits or demerits outside a specific pair
 of true models, signal and noise, in the context of which it is supposed
 to afford scientific insight.  Tell us what a missing-data estimation
 is supposed to mean, scientifically -- give us a model
 for the signal, and (this is mandatory) also for the
 noise (including both what goes missing and what is variable
 within the part that is not missing) -- and we applied
 statisticians can tell you (sometimes) about algorithms 
 that can be proved to work properly under some circumstances.
 Those circumstances become the assumptions you must make,
 to which the rest of us will invariably raise objections.

       Best regards for your project.


 Fred Bookstein
 [EMAIL PROTECTED]
-- 
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to