[ 
https://issues.apache.org/jira/browse/SPARK-16105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346570#comment-15346570
 ] 

Stefan Panayotov commented on SPARK-16105:
------------------------------------------

I understand that the 'reverse' operation is a projection of a 15 dimensional 
subspace into the 96 dimensional space; as in many data science applications 
the original higher dimensional space has domain specific meaning. The PCA 
model allows us to choose a 15 dimensional subspace which captures most of the 
variance in the 96 dimensional space. While in general the reverse 
transformation is not an 'inverse' operator in the sense that it is not a 
bijection, it does return the representation of the 15 dimensional vector in 
the 96 dimensional space. A well trained data scientist knows to inspect the 
impact of their dimensionality reduction in the domain specific coordinate 
system in order to better understand the implicit assumptions being imposed by 
their pipelines. This is accomplished by applying the reverse operation and 
comparing the results back to the vector valued column on which the PCA model 
was originally applied.

If there is any confusion, I can LaTeX out the mathematics for you.

> PCA Reverse Transformer
> -----------------------
>
>                 Key: SPARK-16105
>                 URL: https://issues.apache.org/jira/browse/SPARK-16105
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 1.6.1
>            Reporter: Stefan Panayotov
>            Priority: Minor
>
> The PCA class has a fit method that returns a PCAModel. One of the members of 
> the PCAModel is a pc (Principal Components Matrix). This matrix is available 
> for inspection, but there is no method to use this matrix for reverse 
> transformation back to the original dimension. For example, if I use the PCA 
> to reduce dimensionality of my space from 96 to 15, I get a 96x15 pc Matrix. 
> I can do some modeling in my reduced space and then I need to  reverse back 
> to the original 96 dimensional space. Basically, I need to multiply my 15 
> dimensional vectors by the 96x15 pc Matrix to get back 96 dimensional 
> vectors. Such method is missing from the PCA model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to