# Re: [MORPHMET] Interpreting PCA results

```Dear Mahediran,

```
to my understanding from David's phrasing, it is just a way to visualize shape variation along a given PC axis (as the value at 0 is the mean). One could use some other criterion (for instance the maximum and minimum scores along that given PC).
```
```
But, of course, all the other considerations of whether or not it makes sense to interpret (or at least explore) the patterns predicted along a given PC axis still apply.
```
Best,

Carmelo

Il 15/05/2017 4:30 PM, mahendiran mylswamy ha scritto:
```
```Dear all,
I find David question us interesting.
If any one could answer David question in a simple way?
I am quoting his question below?.
```
"What I do not quite understand is what exactly is the purpose of applying standard deviation(s) to the PCA and then warping the Procrustes average shape to these standard deviations? "
```

```
On 15 May 2017 6:08 a.m., "F. James Rohlf" <f.james.ro...@stonybrook.edu <mailto:f.james.ro...@stonybrook.edu>> wrote:
```
I agree with these comments but would like to add another point. I
prefer to think that the purpose of the PCA is to produce a
low-dimension space that captures as much of the overall variation
(in a least-squares sense) as possible. Within that space there is
no need to limit the visualizations to the extremes of each axis –
one can investigate any direction within that space if there is a
pattern in the data that suggests an interesting direction. The
directions of the axes are mathematical constructs and not bases
on any biological principles. Perhaps one sees some clusters in
the PCA ordination but the variation within or between clusters
need not be parallel to one of the PC axes. One can then look in
other directions. That is why the tpsRelw program allows one to
visualize any point in the ordination space – not just parallel to
the axes. That means for publication one has to decide which
directions are of interest – not just mechanically display the
extremes of the axes.

----------------------

F. James Rohlf *New email: f.james.ro...@stonybrook.edu
<mailto:f.james.ro...@stonybrook.edu>*

Distinguished Professor, Emeritus. Dept. of Ecol. & Evol.

& Research Professor. Dept. of Anthropology

Stony Brook University 11794-4364

WWW: http://life.bio.sunysb.edu/morph/rohlf
<http://life.bio.sunysb.edu/morph/rohlf>

PPlease consider the environment before printing this email

*From:* K. James Soda [mailto:k.jamess...@gmail.com
<mailto:k.jamess...@gmail.com>]
*Sent:* Sunday, May 14, 2017 7:28 PM
*To:* dsbriss_dmd <orthofl...@gmail.com <mailto:orthofl...@gmail.com>>
*Cc:* MORPHMET <morphmet@morphometrics.org
<mailto:morphmet@morphometrics.org>>
*Subject:* Re: [MORPHMET] Interpreting PCA results

Dear David,

Great question!  I disagree with the statement that the samples'
variance in shape space is not biologically real or, perhaps more
accurately, is less real than the variance in any other space.  As
far as I see it, the basic strategy in any biostatistical study,
be it GM or otherwise, is that a researcher represents a real
biological population as an abstract statistical population.  They
then use this abstract statistical population as a proxy for the
real one so that inferences in the statistical space have
implications for the real world.

For example, a PCA finds a direction in the statistical space in
which the statistical population tends to be spread out.  This is
interesting to the researcher because this direction has a
correspondence to certain real world variables.  As a result, the
PCA tells the researcher in what ways the real population tends to
vary.  The key point, though, is that the researcher transitioned
from the statistical space to the real world.

Moving from shape space to the real world is no different in
principle.  We have a real population of specimens, whose shape
are of interest to us, and we represent them using vectors of
shape variables.  The vectors are abstractions; it is not as if we
can hold a vector in our hands. However, this is irrelevant
because they are just proxies, no less real than any other
quantitative representation.  What matters is if we can tie them
back to the real world.  This is why morphometricians implement a
visualization step. In a PCA, the PCs describe how our proxies
vary, and we visualize in order to see how this variation appears
in the real world.  It is infeasible to visualize every point
along this axis, so we instead visualize a handful.  Since the
core goal in PCA (at least in this context) is to describe
variance, we generally describe the locations where a
visualization occurs in units of standard deviations from the
mean.  We could use absolute distances along an axis, but this is
probably more arbitrary than standard deviation units.  The
standard deviations come from the data's distribution, whereas the
absolute distance is really only well-defined in the mathematical
space.

To summarize: i) Nearly all quantitative analyses involve an
abstraction to a mathematical space. ii) The description of points
in a mathematical space is useful to the researcher because the
researcher is able to translate the abstract mathematical space
into a real world interpretation.  iii) In GM, the shape variables
are traditionally translated into the real world via
visualization.  Ergo, morphometricians often interpret PCA results
via visualizations along individual PCs.  To aid in
interpretation, this tends to occur in standard deviation units
because the standard deviation is more easily tied to the real
world relative to arbitrary selecting a unit of distance.

Perhaps some of these points are up for debate, but remember that
statistics is largely the study of VARIATION.  If the variation in
shape space did not have any biological significance, almost no
analysis after alignment would be possible.

Hope somewhere in this long commentary, you found something helpful,

James

On Tue, May 9, 2017 at 4:56 PM, dsbriss_dmd <orthofl...@gmail.com
<mailto:orthofl...@gmail.com>> wrote:

Good afternoon all, I have a question about interpretation of
PCs.  I have come across several articles in orthodontic
literature having to do with morphometric analysis of sagittal
cephalograms that discuss warping a Procrustes analysis along
a principal component axis.  Essentially the authors discuss
finding whatever principal components represent shape
variance, then determining the standard deviation(s) of those
PC's, and applying the standard deviations to the Procrustes
shape to warp the average shape plus or minus.  So if you have
an average normodivergent Procrustes shape, one warp perhaps
in the negative direction might give you a brachycephalic
shape, while the opposite would give you a dolichocephalic
shape.  But I don't know where this idea comes from.  I have
been involved with 8 or 9 morphometrics projects over the last
few years and I have never been able to figure this out or the
rationale for performing such an application with the PC results.

As an example of what I am talking about here is a passage
from the Journal of Clinical & Diagnostic research, doi:
10.7860/JCDR/2015/8971.5458
<https://dx.doi.org/10.7860%2FJCDR%2F2015%2F8971.5458>

"Here, the first 2 PCs are shown & the Average shape (middle)
was warped by applying each PC by amount equal to 3 standard
deviations in negative (left) and positive (right) direction
{[Table/Fig-10
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347171/figure/F10/>]:
PC1 with standard deviation, [Table/Fig-11
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347171/figure/F11/>]
PC 2 with standard deviation}."

I did not include the graphs from the article but if it would
help to answer this question I can supply them.

What I do not quite understand is what exactly is the purpose
of applying standard deviation(s) to the PCA and then warping
```
the Procrustes average shape to these standard deviations? Maybe my understanding of PCA is limited, but I was under the
```        impression that in GPA the principal components are only
statistical variance, and don't represent something
biologically real.  So to see how an individual varies from
the shape average you have to go back and look at whatever
landmark(s) represent that specific individual and compare
that shape to the Procrustes average.  Maybe this is not correct?

David

```
-- MORPHMET may be accessed via its webpage at
```        http://www.morphometrics.org
---
You received this message because you are subscribed to the
To unsubscribe from this group and stop receiving emails from
it, send an email to morphmet+unsubscr...@morphometrics.org
<mailto:morphmet+unsubscr...@morphometrics.org>.

```
-- MORPHMET may be accessed via its webpage at
```    http://www.morphometrics.org
---
Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to morphmet+unsubscr...@morphometrics.org
<mailto:morphmet+unsubscr...@morphometrics.org>.

```
-- MORPHMET may be accessed via its webpage at
```    http://www.morphometrics.org
---
Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to morphmet+unsubscr...@morphometrics.org
<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
```
You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org <mailto:morphmet+unsubscr...@morphometrics.org>.
```
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
```
--- You received this message because you are subscribed to the Google Groups "MORPHMET" group.
```To unsubscribe from this group and stop receiving emails from it, send an email
to morphmet+unsubscr...@morphometrics.org.
```