[
https://issues.apache.org/jira/browse/SPARK-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413013#comment-16413013
]
Sujith Jay Nair commented on SPARK-23437:
-----------------------------------------
+1 for the this initiative. To garner support for this initiative, we need to
come up with strong reasons why GPs are needed as part of Spark ML. This could
be done as part of the documentation of your implementation.
You do mention GPflow as an example of the TensorFlow ecosystem supporting
linear-time GPs; however, that still is a third-party library. If anything, it
vouches for the opinion that this functionality should be kept separate from
core Spark ML. Like Seth Henderson mentions above, it would help tremendously
to showcase more packages which have this algo implemented.
> [ML] Distributed Gaussian Process Regression for MLlib
> ------------------------------------------------------
>
> Key: SPARK-23437
> URL: https://issues.apache.org/jira/browse/SPARK-23437
> Project: Spark
> Issue Type: New Feature
> Components: ML, MLlib
> Affects Versions: 2.2.1
> Reporter: Valeriy Avanesov
> Assignee: Apache Spark
> Priority: Major
>
> Gaussian Process Regression (GP) is a well known black box non-linear
> regression approach [1]. For years the approach remained inapplicable to
> large samples due to its cubic computational complexity, however, more recent
> techniques (Sparse GP) allowed for only linear complexity. The field
> continues to attracts interest of the researches – several papers devoted to
> GP were present on NIPS 2017.
> Unfortunately, non-parametric regression techniques coming with mllib are
> restricted to tree-based approaches.
> I propose to create and include an implementation (which I am going to work
> on) of so-called robust Bayesian Committee Machine proposed and investigated
> in [2].
> [1] Carl Edward Rasmussen and Christopher K. I. Williams. 2005. _Gaussian
> Processes for Machine Learning (Adaptive Computation and Machine Learning)_.
> The MIT Press.
> [2] Marc Peter Deisenroth and Jun Wei Ng. 2015. Distributed Gaussian
> processes. In _Proceedings of the 32nd International Conference on
> International Conference on Machine Learning - Volume 37_ (ICML'15), Francis
> Bach and David Blei (Eds.), Vol. 37. JMLR.org 1481-1490.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]