[
https://issues.apache.org/jira/browse/SPARK-21209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062908#comment-16062908
]
Ben St. Clair commented on SPARK-21209:
---------------------------------------
I would love to chat about design with anyone interested in this feature. I've
been working on an ML implementation on a recent fork, and, while I have put in
considerable work, I am not attached to one solution or another (I'll post it
in a day or two, to stimulate conversation or jumpstart a solution). Just like
the existing PCA implementation in ML, one notable design constraint includes
the temporary dependence on the MLlib linear algebra package---at least until a
similar package is available for ML datasets.
> Implement Incremental PCA algorithm for MLlib
> ---------------------------------------------
>
> Key: SPARK-21209
> URL: https://issues.apache.org/jira/browse/SPARK-21209
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Affects Versions: 2.1.1
> Reporter: Ben St. Clair
> Labels: features
>
> Incremental Principal Component Analysis is a method for calculating PCAs in
> an incremental fashion, allowing one to update an existing PCA model as new
> evidence arrives. Furthermore, an alpha parameter can be used to enable
> task-specific weighting of new and old evidence.
> This algorithm would be useful for streaming applications, where a fast and
> adaptive feature subspace calculation could be applied. Furthermore, it can
> be applied to combine PCAs from subcomponents of large datasets.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]