Niketan Pansare created SYSTEMML-1123:
-----------------------------------------
Summary: Refactor scikit-learn to make it scalable by using Python
DSL
Key: SYSTEMML-1123
URL: https://issues.apache.org/jira/browse/SYSTEMML-1123
Project: SystemML
Issue Type: New Feature
Reporter: Niketan Pansare
1. Eliminate explicit conversion of systemml matrix to NumPy arrays:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L382
2. Use scalable SystemML operations whenever possible
Following code should work:
{code:java}
from sklearn import datasets, neighbors, linear_model
import systemml as sml
X_train = sml.matrix( ... )
y_train = sml.matrix( ... )
X_test = sml.matrix( ... )
y_test = sml.matrix( ... )
knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()
print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))
print('LogisticRegression score: %f'
% logistic.fit(X_train, y_train).score(X_test, y_test))
{code}
[[email protected]] [~iyounus] [~freiss] [~reinwald]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)