[SYSTEMML-1144] Fix PCA documentation for principal Update 'principle' to 'principal'.
Closes #311. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/8b917582 Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/8b917582 Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/8b917582 Branch: refs/heads/gh-pages Commit: 8b917582dfdae9dc001115ea3376e94d7f49e2d2 Parents: fa88464 Author: Deron Eriksson <[email protected]> Authored: Thu Dec 8 13:24:29 2016 -0800 Committer: Deron Eriksson <[email protected]> Committed: Thu Dec 8 13:24:29 2016 -0800 ---------------------------------------------------------------------- Algorithms Reference/PCA.tex | 16 ++++++++-------- algorithms-matrix-factorization.md | 28 ++++++++++++++-------------- algorithms-reference.md | 2 +- 3 files changed, 23 insertions(+), 23 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/Algorithms Reference/PCA.tex ---------------------------------------------------------------------- diff --git a/Algorithms Reference/PCA.tex b/Algorithms Reference/PCA.tex index 5895502..cef750e 100644 --- a/Algorithms Reference/PCA.tex +++ b/Algorithms Reference/PCA.tex @@ -19,12 +19,12 @@ \end{comment} -\subsection{Principle Component Analysis} +\subsection{Principal Component Analysis} \label{pca} \noindent{\bf Description} -Principle Component Analysis (PCA) is a simple, non-parametric method to transform the given data set with possibly correlated columns into a set of linearly uncorrelated or orthogonal columns, called {\em principle components}. The principle components are ordered in such a way that the first component accounts for the largest possible variance, followed by remaining principle components in the decreasing order of the amount of variance captured from the data. PCA is often used as a dimensionality reduction technique, where the original data is projected or rotated onto a low-dimensional space with basis vectors defined by top-$K$ (for a given value of $K$) principle components. +Principal Component Analysis (PCA) is a simple, non-parametric method to transform the given data set with possibly correlated columns into a set of linearly uncorrelated or orthogonal columns, called {\em principal components}. The principal components are ordered in such a way that the first component accounts for the largest possible variance, followed by remaining principal components in the decreasing order of the amount of variance captured from the data. PCA is often used as a dimensionality reduction technique, where the original data is projected or rotated onto a low-dimensional space with basis vectors defined by top-$K$ (for a given value of $K$) principal components. \\ \noindent{\bf Usage} @@ -45,10 +45,10 @@ Principle Component Analysis (PCA) is a simple, non-parametric method to transfo \begin{itemize} \item INPUT: Location (on HDFS) to read the input matrix. -\item K: Indicates dimension of the new vector space constructed from $K$ principle components. It must be a value between $1$ and the number of columns in the input data. -\item CENTER (default: {\tt 0}): Indicates whether or not to {\em center} input data prior to the computation of principle components. -\item SCALE (default: {\tt 0}): Indicates whether or not to {\em scale} input data prior to the computation of principle components. -\item PROJDATA: Indicates whether or not the input data must be projected on to new vector space defined over principle components. +\item K: Indicates dimension of the new vector space constructed from $K$ principal components. It must be a value between $1$ and the number of columns in the input data. +\item CENTER (default: {\tt 0}): Indicates whether or not to {\em center} input data prior to the computation of principal components. +\item SCALE (default: {\tt 0}): Indicates whether or not to {\em scale} input data prior to the computation of principal components. +\item PROJDATA: Indicates whether or not the input data must be projected on to new vector space defined over principal components. \item OFMT (default: {\tt csv}): Specifies the output format. Choice of comma-separated values (csv) or as a sparse-matrix (text). \item MODEL: Either the location (on HDFS) where the computed model is stored; or the location of an existing model. \item OUTPUT: Location (on HDFS) to store the data rotated on to the new vector space. @@ -56,7 +56,7 @@ Principle Component Analysis (PCA) is a simple, non-parametric method to transfo \noindent{\bf Details} -Principle Component Analysis (PCA) is a non-parametric procedure for orthogonal linear transformation of the input data to a new coordinate system, such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. In other words, PCA first selects a normalized direction in $m$-dimensional space ($m$ is the number of columns in the input data) along which the variance in input data is maximized -- this is referred to as the first principle component. It then repeatedly finds other directions (principle components) in which the variance is maximized. At every step, PCA restricts the search for only those directions that are perpendicular to all previously selected directions. By doing so, PCA aims to reduce the redundancy among input variables. To understand the notion of redundancy, consider an extreme scenario with a data set comprising of two v ariables, where the first one denotes some quantity expressed in meters, and the other variable represents the same quantity but in inches. Both these variables evidently capture redundant information, and hence one of them can be removed. In a general scenario, keeping solely the linear combination of input variables would both express the data more concisely and reduce the number of variables. This is why PCA is often used as a dimensionality reduction technique. +Principal Component Analysis (PCA) is a non-parametric procedure for orthogonal linear transformation of the input data to a new coordinate system, such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. In other words, PCA first selects a normalized direction in $m$-dimensional space ($m$ is the number of columns in the input data) along which the variance in input data is maximized -- this is referred to as the first principal component. It then repeatedly finds other directions (principal components) in which the variance is maximized. At every step, PCA restricts the search for only those directions that are perpendicular to all previously selected directions. By doing so, PCA aims to reduce the redundancy among input variables. To understand the notion of redundancy, consider an extreme scenario with a data set comprising of two v ariables, where the first one denotes some quantity expressed in meters, and the other variable represents the same quantity but in inches. Both these variables evidently capture redundant information, and hence one of them can be removed. In a general scenario, keeping solely the linear combination of input variables would both express the data more concisely and reduce the number of variables. This is why PCA is often used as a dimensionality reduction technique. The specific method to compute such a new coordinate system is as follows -- compute a covariance matrix $C$ that measures the strength of correlation among all pairs of variables in the input data; factorize $C$ according to eigen decomposition to calculate its eigenvalues and eigenvectors; and finally, order eigenvectors in the decreasing order of their corresponding eigenvalue. The computed eigenvectors (also known as {\em loadings}) define the new coordinate system and the square root of eigen values provide the amount of variance in the input data explained by each coordinate or eigenvector. \\ @@ -112,7 +112,7 @@ The specific method to compute such a new coordinate system is as follows -- com \noindent{\bf Returns} When MODEL is not provided, PCA procedure is applied on INPUT data to generate MODEL as well as the rotated data OUTPUT (if PROJDATA is set to $1$) in the new coordinate system. -The produced model consists of basis vectors MODEL$/dominant.eigen.vectors$ for the new coordinate system; eigen values MODEL$/dominant.eigen.values$; and the standard deviation MODEL$/dominant.eigen.standard.deviations$ of principle components. +The produced model consists of basis vectors MODEL$/dominant.eigen.vectors$ for the new coordinate system; eigen values MODEL$/dominant.eigen.values$; and the standard deviation MODEL$/dominant.eigen.standard.deviations$ of principal components. When MODEL is provided, INPUT data is rotated according to the coordinate system defined by MODEL$/dominant.eigen.vectors$. The resulting data is stored at location OUTPUT. \\ http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/algorithms-matrix-factorization.md ---------------------------------------------------------------------- diff --git a/algorithms-matrix-factorization.md b/algorithms-matrix-factorization.md index 2ed8a49..51eb614 100644 --- a/algorithms-matrix-factorization.md +++ b/algorithms-matrix-factorization.md @@ -25,20 +25,20 @@ limitations under the License. # 5 Matrix Factorization -## 5.1 Principle Component Analysis +## 5.1 Principal Component Analysis ### Description -Principle Component Analysis (PCA) is a simple, non-parametric method to +Principal Component Analysis (PCA) is a simple, non-parametric method to transform the given data set with possibly correlated columns into a set -of linearly uncorrelated or orthogonal columns, called *principle -components*. The principle components are ordered in such a way +of linearly uncorrelated or orthogonal columns, called *principal +components*. The principal components are ordered in such a way that the first component accounts for the largest possible variance, -followed by remaining principle components in the decreasing order of +followed by remaining principal components in the decreasing order of the amount of variance captured from the data. PCA is often used as a dimensionality reduction technique, where the original data is projected or rotated onto a low-dimensional space with basis vectors defined by -top-$K$ (for a given value of $K$) principle components. +top-$K$ (for a given value of $K$) principal components. ### Usage @@ -80,19 +80,19 @@ top-$K$ (for a given value of $K$) principle components. **INPUT**: Location (on HDFS) to read the input matrix. **K**: Indicates dimension of the new vector space constructed from $K$ - principle components. It must be a value between `1` and the number + principal components. It must be a value between `1` and the number of columns in the input data. **CENTER**: (default: `0`) `0` or `1`. Indicates whether or not to *center* input data prior to the computation of - principle components. + principal components. **SCALE**: (default: `0`) `0` or `1`. Indicates whether or not to *scale* input data prior to the computation of - principle components. + principal components. **PROJDATA**: `0` or `1`. Indicates whether or not the input data must be projected - on to new vector space defined over principle components. + on to new vector space defined over principal components. **OFMT**: (default: `"csv"`) Matrix file output format, such as `text`, `mm`, or `csv`; see read/write functions in @@ -170,7 +170,7 @@ SystemML Language Reference for details. #### Details -Principle Component Analysis (PCA) is a non-parametric procedure for +Principal Component Analysis (PCA) is a non-parametric procedure for orthogonal linear transformation of the input data to a new coordinate system, such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal @@ -178,8 +178,8 @@ component), the second greatest variance on the second coordinate, and so on. In other words, PCA first selects a normalized direction in $m$-dimensional space ($m$ is the number of columns in the input data) along which the variance in input data is maximized â this is referred -to as the first principle component. It then repeatedly finds other -directions (principle components) in which the variance is maximized. At +to as the first principal component. It then repeatedly finds other +directions (principal components) in which the variance is maximized. At every step, PCA restricts the search for only those directions that are perpendicular to all previously selected directions. By doing so, PCA aims to reduce the redundancy among input variables. To understand the @@ -211,7 +211,7 @@ OUTPUT (if PROJDATA is set to $1$) in the new coordinate system. The produced model consists of basis vectors MODEL$/dominant.eigen.vectors$ for the new coordinate system; eigen values MODEL$/dominant.eigen.values$; and the standard deviation -MODEL$/dominant.eigen.standard.deviations$ of principle components. When +MODEL$/dominant.eigen.standard.deviations$ of principal components. When MODEL is provided, INPUT data is rotated according to the coordinate system defined by MODEL$/dominant.eigen.vectors$. The resulting data is stored at location OUTPUT. http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/algorithms-reference.md ---------------------------------------------------------------------- diff --git a/algorithms-reference.md b/algorithms-reference.md index 244b882..26c2141 100644 --- a/algorithms-reference.md +++ b/algorithms-reference.md @@ -48,7 +48,7 @@ limitations under the License. * [Regression Scoring and Prediction](algorithms-regression.html#regression-scoring-and-prediction) * [Matrix Factorization](algorithms-matrix-factorization.html) - * [Principle Component Analysis](algorithms-matrix-factorization.html#principle-component-analysis) + * [Principal Component Analysis](algorithms-matrix-factorization.html#principal-component-analysis) * [Matrix Completion via Alternating Minimizations](algorithms-matrix-factorization.html#matrix-completion-via-alternating-minimizations) * [Survival Analysis](algorithms-survival-analysis.html)
