Hi Ethan,

Good points, the documentation is incomplete. The Arguments section only
describes the arguments for command-line invocation and not via Python and
Scala. This should be clearly marked to avoid confusion.

The Python wrappers are implemented to be compatible with MLLib and Scikit
learn.

For training, you can pass features and labels as
1. Scikit-learn way: two Python objects (X_train, y_train) of type numpy,
pandas or scipy.
model.fit(X_train, y_train)

OR

2. MLLib way: one LabeledPoint DataFrame with atleast two columns: features
(of type Vector) and labels.
model.fit(X_df)

For prediction, you can pass features as
1. Scikit-learn way: one Python  object (X_test) of type numpy, pandas or
scipy.
model.predict(X_test)

OR

2. MLLib way: one LabeledPoint DataFrame (df_test) with atleast one column:
features (of type Vector).
model.transform(df_test)

The usage is briefly described in
https://apache.github.io/incubator-systemml/beginners-guide-python.html#invoke-systemmls-algorithms


Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Ethan Xu <ethan.yifa...@gmail.com>
To:     dev@systemml.incubator.apache.org
Date:   04/19/2017 02:07 PM
Subject:        Documents of SystemML Algorithms Reference



Hello,

I'm reading the documents on Multinomial Logistic Regression (
https://apache.github.io/incubator-systemml/algorithms-classification.html#usage
)
with Scala API. It says

val model = lr.fit(X_train_df)
val prediction = model.transform(X_test_df)


The "Arguments" section below it says:

X: Location (on HDFS) to read the input matrix of feature vectors; each row
constitutes one feature vector.

Y: Location to read the input one-column matrix of category labels that
correspond to feature vectors in X. Note the following:...
The explanation of the arguments seem to correspond to the Hadoop and Spark
API.

Could someone please advise what are the specifications of `X_train_df` and
`X_test_df`? Are they the same as specified in the Python API? i.e.:

# X_train, y_train and X_test can be NumPy matrices or Pandas
DataFrame or SciPy Sparse Matrixy_test = logistic.fit(X_train,
y_train).predict(X_test)# df_train is DataFrame that contains two
columns: "features" (of type Vector) and "label". df_test is a
DataFrame that contains the column "features"

The explanation of arguments for Python/Scala seem to be missing for other
algorithms, too.

Thanks a lot,

Ethan


Reply via email to