[03/50] [abbrv] incubator-systemml git commit: [SYSTEMML-1144] Fix PCA documentation for principal

deron Fri, 07 Apr 2017 11:58:37 -0700

[SYSTEMML-1144] Fix PCA documentation for principal

Update 'principle' to 'principal'.


Closes #311.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/8b917582
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/8b917582
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/8b917582

Branch: refs/heads/gh-pages
Commit: 8b917582dfdae9dc001115ea3376e94d7f49e2d2
Parents: fa88464
Author: Deron Eriksson <[email protected]>
Authored: Thu Dec 8 13:24:29 2016 -0800
Committer: Deron Eriksson <[email protected]>
Committed: Thu Dec 8 13:24:29 2016 -0800

----------------------------------------------------------------------
 Algorithms Reference/PCA.tex       | 16 ++++++++--------
 algorithms-matrix-factorization.md | 28 ++++++++++++++--------------
 algorithms-reference.md            |  2 +-
 3 files changed, 23 insertions(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/Algorithms
 Reference/PCA.tex
----------------------------------------------------------------------
diff --git a/Algorithms Reference/PCA.tex b/Algorithms Reference/PCA.tex
index 5895502..cef750e 100644
--- a/Algorithms Reference/PCA.tex      
+++ b/Algorithms Reference/PCA.tex      
@@ -19,12 +19,12 @@
 
 \end{comment}
 
-\subsection{Principle Component Analysis}
+\subsection{Principal Component Analysis}
 \label{pca}
 
 \noindent{\bf Description}
 
-Principle Component Analysis (PCA) is a simple, non-parametric method to 
transform the given data set with possibly correlated columns into a set of 
linearly uncorrelated or orthogonal columns, called {\em principle components}. 
The principle components are ordered in such a way that the first component 
accounts for the largest possible variance, followed by remaining principle 
components in the decreasing order of the amount of variance captured from the 
data. PCA is often used as a dimensionality reduction technique, where the 
original data is projected or rotated onto a low-dimensional space with basis 
vectors defined by top-$K$ (for a given value of $K$) principle components.
+Principal Component Analysis (PCA) is a simple, non-parametric method to 
transform the given data set with possibly correlated columns into a set of 
linearly uncorrelated or orthogonal columns, called {\em principal components}. 
The principal components are ordered in such a way that the first component 
accounts for the largest possible variance, followed by remaining principal 
components in the decreasing order of the amount of variance captured from the 
data. PCA is often used as a dimensionality reduction technique, where the 
original data is projected or rotated onto a low-dimensional space with basis 
vectors defined by top-$K$ (for a given value of $K$) principal components.
 \\
 
 \noindent{\bf Usage}
@@ -45,10 +45,10 @@ Principle Component Analysis (PCA) is a simple, 
non-parametric method to transfo
 
 \begin{itemize}
 \item INPUT: Location (on HDFS) to read the input matrix.
-\item K: Indicates dimension of the new vector space constructed from $K$ 
principle components. It must be a value between $1$ and the number of columns 
in the input data.
-\item CENTER (default: {\tt 0}): Indicates whether or not to {\em center} 
input data prior to the computation of principle components.
-\item SCALE (default: {\tt 0}): Indicates whether or not to {\em scale} input 
data prior to the computation of principle components.
-\item PROJDATA: Indicates whether or not the input data must be projected on 
to new vector space defined over principle components.
+\item K: Indicates dimension of the new vector space constructed from $K$ 
principal components. It must be a value between $1$ and the number of columns 
in the input data.
+\item CENTER (default: {\tt 0}): Indicates whether or not to {\em center} 
input data prior to the computation of principal components.
+\item SCALE (default: {\tt 0}): Indicates whether or not to {\em scale} input 
data prior to the computation of principal components.
+\item PROJDATA: Indicates whether or not the input data must be projected on 
to new vector space defined over principal components.
 \item OFMT (default: {\tt csv}): Specifies the output format. Choice of 
comma-separated values (csv) or as a sparse-matrix (text).
 \item MODEL: Either the location (on HDFS) where the computed model is stored; 
or the location of an existing model.
 \item OUTPUT: Location (on HDFS) to store the data rotated on to the new 
vector space.
@@ -56,7 +56,7 @@ Principle Component Analysis (PCA) is a simple, 
non-parametric method to transfo
 
 \noindent{\bf Details}
 
-Principle Component Analysis (PCA) is a non-parametric procedure for 
orthogonal linear transformation of the input data to a new coordinate system, 
such that the greatest variance by some projection of the data comes to lie on 
the first coordinate (called the first principal component), the second 
greatest variance on the second coordinate, and so on. In other words, PCA 
first selects a normalized direction in $m$-dimensional space ($m$ is the 
number of columns in the input data) along which the variance in input data is 
maximized -- this is referred to as the first principle component. It then 
repeatedly finds other directions (principle components) in which the variance 
is maximized. At every step, PCA restricts the search for only those directions 
that are perpendicular to all previously selected directions. By doing so, PCA 
aims to reduce the redundancy among input variables. To understand the notion 
of redundancy, consider an extreme scenario with a data set comprising of two v
 ariables, where the first one denotes some quantity expressed in meters, and 
the other variable represents the same quantity but in inches. Both these 
variables evidently capture redundant information, and hence one of them can be 
removed. In a general scenario, keeping solely the linear combination of input 
variables would both express the data more concisely and reduce the number of 
variables. This is why PCA is often used as a dimensionality reduction 
technique.
+Principal Component Analysis (PCA) is a non-parametric procedure for 
orthogonal linear transformation of the input data to a new coordinate system, 
such that the greatest variance by some projection of the data comes to lie on 
the first coordinate (called the first principal component), the second 
greatest variance on the second coordinate, and so on. In other words, PCA 
first selects a normalized direction in $m$-dimensional space ($m$ is the 
number of columns in the input data) along which the variance in input data is 
maximized -- this is referred to as the first principal component. It then 
repeatedly finds other directions (principal components) in which the variance 
is maximized. At every step, PCA restricts the search for only those directions 
that are perpendicular to all previously selected directions. By doing so, PCA 
aims to reduce the redundancy among input variables. To understand the notion 
of redundancy, consider an extreme scenario with a data set comprising of two v
 ariables, where the first one denotes some quantity expressed in meters, and 
the other variable represents the same quantity but in inches. Both these 
variables evidently capture redundant information, and hence one of them can be 
removed. In a general scenario, keeping solely the linear combination of input 
variables would both express the data more concisely and reduce the number of 
variables. This is why PCA is often used as a dimensionality reduction 
technique.
 
 The specific method to compute such a new coordinate system is as follows -- 
compute a covariance matrix $C$ that measures the strength of correlation among 
all pairs of variables in the input data; factorize $C$ according to eigen 
decomposition to calculate its eigenvalues and eigenvectors; and finally, order 
eigenvectors in the decreasing order of their corresponding eigenvalue. The 
computed eigenvectors (also known as {\em loadings}) define the new coordinate 
system and the square root of eigen values provide the amount of variance in 
the input data explained by each coordinate or eigenvector. 
 \\
@@ -112,7 +112,7 @@ The specific method to compute such a new coordinate system 
is as follows -- com
 
 \noindent{\bf Returns}
 When MODEL is not provided, PCA procedure is applied on INPUT data to generate 
MODEL as well as the rotated data OUTPUT (if PROJDATA is set to $1$) in the new 
coordinate system. 
-The produced model consists of basis vectors MODEL$/dominant.eigen.vectors$ 
for the new coordinate system; eigen values MODEL$/dominant.eigen.values$; and 
the standard deviation MODEL$/dominant.eigen.standard.deviations$ of principle 
components.
+The produced model consists of basis vectors MODEL$/dominant.eigen.vectors$ 
for the new coordinate system; eigen values MODEL$/dominant.eigen.values$; and 
the standard deviation MODEL$/dominant.eigen.standard.deviations$ of principal 
components.
 When MODEL is provided, INPUT data is rotated according to the coordinate 
system defined by MODEL$/dominant.eigen.vectors$. The resulting data is stored 
at location OUTPUT.
 \\
 

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/algorithms-matrix-factorization.md
----------------------------------------------------------------------
diff --git a/algorithms-matrix-factorization.md 
b/algorithms-matrix-factorization.md
index 2ed8a49..51eb614 100644
--- a/algorithms-matrix-factorization.md
+++ b/algorithms-matrix-factorization.md
@@ -25,20 +25,20 @@ limitations under the License.
 # 5 Matrix Factorization
 
 
-## 5.1 Principle Component Analysis
+## 5.1 Principal Component Analysis
 
 ### Description
 
-Principle Component Analysis (PCA) is a simple, non-parametric method to
+Principal Component Analysis (PCA) is a simple, non-parametric method to
 transform the given data set with possibly correlated columns into a set
-of linearly uncorrelated or orthogonal columns, called *principle
-components*. The principle components are ordered in such a way
+of linearly uncorrelated or orthogonal columns, called *principal
+components*. The principal components are ordered in such a way
 that the first component accounts for the largest possible variance,
-followed by remaining principle components in the decreasing order of
+followed by remaining principal components in the decreasing order of
 the amount of variance captured from the data. PCA is often used as a
 dimensionality reduction technique, where the original data is projected
 or rotated onto a low-dimensional space with basis vectors defined by
-top-$K$ (for a given value of $K$) principle components.
+top-$K$ (for a given value of $K$) principal components.
 
 
 ### Usage
@@ -80,19 +80,19 @@ top-$K$ (for a given value of $K$) principle components.
 **INPUT**: Location (on HDFS) to read the input matrix.
 
 **K**: Indicates dimension of the new vector space constructed from $K$
-    principle components. It must be a value between `1` and the number
+    principal components. It must be a value between `1` and the number
     of columns in the input data.
 
 **CENTER**: (default: `0`) `0` or `1`. Indicates whether or not to
     *center* input data prior to the computation of
-    principle components.
+    principal components.
 
 **SCALE**: (default: `0`) `0` or `1`. Indicates whether or not to
     *scale* input data prior to the computation of
-    principle components.
+    principal components.
 
 **PROJDATA**: `0` or `1`. Indicates whether or not the input data must be 
projected
-    on to new vector space defined over principle components.
+    on to new vector space defined over principal components.
 
 **OFMT**: (default: `"csv"`) Matrix file output format, such as `text`,
 `mm`, or `csv`; see read/write functions in
@@ -170,7 +170,7 @@ SystemML Language Reference for details.
 
 #### Details
 
-Principle Component Analysis (PCA) is a non-parametric procedure for
+Principal Component Analysis (PCA) is a non-parametric procedure for
 orthogonal linear transformation of the input data to a new coordinate
 system, such that the greatest variance by some projection of the data
 comes to lie on the first coordinate (called the first principal
@@ -178,8 +178,8 @@ component), the second greatest variance on the second 
coordinate, and
 so on. In other words, PCA first selects a normalized direction in
 $m$-dimensional space ($m$ is the number of columns in the input data)
 along which the variance in input data is maximized â this is referred
-to as the first principle component. It then repeatedly finds other
-directions (principle components) in which the variance is maximized. At
+to as the first principal component. It then repeatedly finds other
+directions (principal components) in which the variance is maximized. At
 every step, PCA restricts the search for only those directions that are
 perpendicular to all previously selected directions. By doing so, PCA
 aims to reduce the redundancy among input variables. To understand the
@@ -211,7 +211,7 @@ OUTPUT (if PROJDATA is set to $1$) in the new coordinate 
system. The
 produced model consists of basis vectors MODEL$/dominant.eigen.vectors$
 for the new coordinate system; eigen values
 MODEL$/dominant.eigen.values$; and the standard deviation
-MODEL$/dominant.eigen.standard.deviations$ of principle components. When
+MODEL$/dominant.eigen.standard.deviations$ of principal components. When
 MODEL is provided, INPUT data is rotated according to the coordinate
 system defined by MODEL$/dominant.eigen.vectors$. The resulting data is
 stored at location OUTPUT.

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/8b917582/algorithms-reference.md
----------------------------------------------------------------------
diff --git a/algorithms-reference.md b/algorithms-reference.md
index 244b882..26c2141 100644
--- a/algorithms-reference.md
+++ b/algorithms-reference.md
@@ -48,7 +48,7 @@ limitations under the License.
   * [Regression Scoring and 
Prediction](algorithms-regression.html#regression-scoring-and-prediction)
   
 * [Matrix Factorization](algorithms-matrix-factorization.html)
-  * [Principle Component 
Analysis](algorithms-matrix-factorization.html#principle-component-analysis)
+  * [Principal Component 
Analysis](algorithms-matrix-factorization.html#principal-component-analysis)
   * [Matrix Completion via Alternating 
Minimizations](algorithms-matrix-factorization.html#matrix-completion-via-alternating-minimizations)
 
 * [Survival Analysis](algorithms-survival-analysis.html)

[03/50] [abbrv] incubator-systemml git commit: [SYSTEMML-1144] Fix PCA documentation for principal

Reply via email to