[
https://issues.apache.org/jira/browse/SYSTEMML-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Niketan Pansare updated SYSTEMML-1962:
--------------------------------------
Description:
The end goal of this JIRA is to support model selection facility similar to
[http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection].
Currently, we support model selection using MLPipeline's cross-validator. For
example: please replace `from pyspark.ml.classification import
LogisticRegression` with `from systemml.mllearn import LogisticRegression` in
the example
http://spark.apache.org/docs/2.1.1/ml-tuning.html#example-model-selection-via-cross-validation.
However, this invokes k-seperate and independent mlcontext calls. This PR
proposes to add a new class `GridSearchCV`, `RandomizedSearchCV` and possibly
bayesian optimization which like mllearn has methods `fit` and `predict`. These
methods internally generate a script that wraps the external script with a
`parfor` when the fit method is called. For example:
{code}
from sklearn import datasets
from systemml.mllearn import GridSearchCV, SVM
iris = datasets.load_iris()
parameters = {'C':[1, 10]}
svm = SVM()
clf = GridSearchCV(svm, parameters)
clf.fit(iris.data, iris.target)
{code}
would execute the script:
{code}
CVals = matrix("1; 10", rows=2, cols=1)
parfor(i in seq(1, nrow(CVals))) {
C = CVals[i, 1]
# SVM script
}
{code}
was:The end goal of this JIRA is to support model selection facility similar
to
[http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection].
> Support model-selection via mllearn APIs
> ----------------------------------------
>
> Key: SYSTEMML-1962
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1962
> Project: SystemML
> Issue Type: New Feature
> Reporter: Niketan Pansare
>
> The end goal of this JIRA is to support model selection facility similar to
> [http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection].
> Currently, we support model selection using MLPipeline's cross-validator. For
> example: please replace `from pyspark.ml.classification import
> LogisticRegression` with `from systemml.mllearn import LogisticRegression` in
> the example
> http://spark.apache.org/docs/2.1.1/ml-tuning.html#example-model-selection-via-cross-validation.
>
> However, this invokes k-seperate and independent mlcontext calls. This PR
> proposes to add a new class `GridSearchCV`, `RandomizedSearchCV` and possibly
> bayesian optimization which like mllearn has methods `fit` and `predict`.
> These methods internally generate a script that wraps the external script
> with a `parfor` when the fit method is called. For example:
> {code}
> from sklearn import datasets
> from systemml.mllearn import GridSearchCV, SVM
> iris = datasets.load_iris()
> parameters = {'C':[1, 10]}
> svm = SVM()
> clf = GridSearchCV(svm, parameters)
> clf.fit(iris.data, iris.target)
> {code}
> would execute the script:
> {code}
> CVals = matrix("1; 10", rows=2, cols=1)
> parfor(i in seq(1, nrow(CVals))) {
> C = CVals[i, 1]
> # SVM script
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)