[
https://issues.apache.org/jira/browse/SYSTEMML-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frederick Reiss updated SYSTEMML-1146:
--------------------------------------
Assignee: Prithviraj Sen
> Improve PCA description in documentation
> ----------------------------------------
>
> Key: SYSTEMML-1146
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1146
> Project: SystemML
> Issue Type: Improvement
> Components: Documentation
> Reporter: Deron Eriksson
> Assignee: Prithviraj Sen
> Priority: Minor
>
> David P. Nichols reports that the first sentence of the PCA description in
> the Algorithms Reference is inaccurate
> (http://apache.github.io/incubator-systemml/algorithms-matrix-factorization.html#principal-component-analysis).
> "Principal Component Analysis (PCA) is a simple, non-parametric method to
> transform the given data set with possibly correlated columns into a set of
> linearly uncorrelated or orthogonal columns, called principal components."
> The problem with this statement is that principal component scores typically
> will not be uncorrelated unless the input data have been centered (or began
> with means of 0). Orthogonal and uncorrelated are not the same thing. Whether
> or not two vectors are orthogonal is a function of the raw values, while
> covariance and hence correlation are functions of the centered values.
> It looks like the text was taken from Wikipedia's Principal component
> analysis entry. Whoever wrote that part of that entry seems to be assuming
> that principal components analysis always involves working on a matrix of
> centered (or centered and scaled) data, but that is not always the case. The
> default in SystemML is not to center input columns, so typically resulting
> data columns will not be uncorrelated, though they will be orthogonal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)