[
https://issues.apache.org/jira/browse/SPARK-16105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346676#comment-15346676
]
Stefan Panayotov commented on SPARK-16105:
------------------------------------------
Well, what we currently do is:
- save the PCA Matrix, e.g.
val principalMatrix = pcaMod.pc
- define UDF as
def reversePCA = udf((v: org.apache.spark.mllib.linalg.Vector) => {
principalMatrix.multiply(v)
})
- use the UDF to create new column in the Data Frame to hold the resulting 96
dimensional vector
val fitTransOut_df = cAssembler..withColumn("predErrors",
reversePCA($"predPCDenseVec"))
Granted, the UDF is pretty simple, but I thought it would be logical to expect
a method (as part of the model) to do what the UDF is doing. That's all.
> PCA Reverse Transformer
> -----------------------
>
> Key: SPARK-16105
> URL: https://issues.apache.org/jira/browse/SPARK-16105
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Affects Versions: 1.6.1
> Reporter: Stefan Panayotov
> Priority: Minor
>
> The PCA class has a fit method that returns a PCAModel. One of the members of
> the PCAModel is a pc (Principal Components Matrix). This matrix is available
> for inspection, but there is no method to use this matrix for reverse
> transformation back to the original dimension. For example, if I use the PCA
> to reduce dimensionality of my space from 96 to 15, I get a 96x15 pc Matrix.
> I can do some modeling in my reduced space and then I need to reverse back
> to the original 96 dimensional space. Basically, I need to multiply my 15
> dimensional vectors by the 96x15 pc Matrix to get back 96 dimensional
> vectors. Such method is missing from the PCA model.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]