Hi Ethan, Good points, the documentation is incomplete. The Arguments section only describes the arguments for command-line invocation and not via Python and Scala. This should be clearly marked to avoid confusion.
The Python wrappers are implemented to be compatible with MLLib and Scikit learn. For training, you can pass features and labels as 1. Scikit-learn way: two Python objects (X_train, y_train) of type numpy, pandas or scipy. model.fit(X_train, y_train) OR 2. MLLib way: one LabeledPoint DataFrame with atleast two columns: features (of type Vector) and labels. model.fit(X_df) For prediction, you can pass features as 1. Scikit-learn way: one Python object (X_test) of type numpy, pandas or scipy. model.predict(X_test) OR 2. MLLib way: one LabeledPoint DataFrame (df_test) with atleast one column: features (of type Vector). model.transform(df_test) The usage is briefly described in https://apache.github.io/incubator-systemml/beginners-guide-python.html#invoke-systemmls-algorithms Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Ethan Xu <ethan.yifa...@gmail.com> To: dev@systemml.incubator.apache.org Date: 04/19/2017 02:07 PM Subject: Documents of SystemML Algorithms Reference Hello, I'm reading the documents on Multinomial Logistic Regression ( https://apache.github.io/incubator-systemml/algorithms-classification.html#usage ) with Scala API. It says val model = lr.fit(X_train_df) val prediction = model.transform(X_test_df) The "Arguments" section below it says: X: Location (on HDFS) to read the input matrix of feature vectors; each row constitutes one feature vector. Y: Location to read the input one-column matrix of category labels that correspond to feature vectors in X. Note the following:... The explanation of the arguments seem to correspond to the Hadoop and Spark API. Could someone please advise what are the specifications of `X_train_df` and `X_test_df`? Are they the same as specified in the Python API? i.e.: # X_train, y_train and X_test can be NumPy matrices or Pandas DataFrame or SciPy Sparse Matrixy_test = logistic.fit(X_train, y_train).predict(X_test)# df_train is DataFrame that contains two columns: "features" (of type Vector) and "label". df_test is a DataFrame that contains the column "features" The explanation of arguments for Python/Scala seem to be missing for other algorithms, too. Thanks a lot, Ethan