[ 
https://issues.apache.org/jira/browse/SYSTEMML-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-1123:
--------------------------------------
    Summary: Refactor/Fork scikit-learn to make it scalable by using Python DSL 
 (was: Refactor scikit-learn to make it scalable by using Python DSL)

> Refactor/Fork scikit-learn to make it scalable by using Python DSL
> ------------------------------------------------------------------
>
>                 Key: SYSTEMML-1123
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1123
>             Project: SystemML
>          Issue Type: New Feature
>            Reporter: Niketan Pansare
>
> 1. Eliminate explicit conversion of systemml matrix to NumPy arrays: 
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L382
> 2. Use scalable SystemML operations whenever possible
> Following code should work:
> {code:java}
> from sklearn import datasets, neighbors, linear_model
> import systemml as sml
> X_train = sml.matrix( ... )
> y_train = sml.matrix( ... )
> X_test = sml.matrix( ... )
> y_test = sml.matrix( ... )
> knn = neighbors.KNeighborsClassifier()
> logistic = linear_model.LogisticRegression()
> print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))
> print('LogisticRegression score: %f'
>       % logistic.fit(X_train, y_train).score(X_test, y_test))
> {code}
> [[email protected]] [~iyounus] [~freiss] [~reinwald]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to