Thanks Niketan! This helps a lot. Best,
Ethan On Wed, Apr 19, 2017 at 5:21 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > Hi Ethan, > > Good points, the documentation is incomplete. The Arguments section only > describes the arguments for command-line invocation and not via Python and > Scala. This should be clearly marked to avoid confusion. > > The Python wrappers are implemented to be compatible with MLLib and Scikit > learn. > > For training, you can pass features and labels as > 1. Scikit-learn way: two Python objects (X_train, y_train) of type numpy, > pandas or scipy. > model.fit(X_train, y_train) > > OR > > 2. MLLib way: one LabeledPoint DataFrame with atleast two columns: > features (of type Vector) and labels. > model.fit(X_df) > > For prediction, you can pass features as > 1. Scikit-learn way: one Python object (X_test) of type numpy, pandas or > scipy. > model.predict(X_test) > > OR > > 2. MLLib way: one LabeledPoint DataFrame (df_test) with atleast one > column: features (of type Vector). > model.transform(df_test) > > The usage is briefly described in https://apache.github.io/ > incubator-systemml/beginners-guide-python.html#invoke-systemmls-algorithms > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Ethan Xu ---04/19/2017 02:07:34 > PM---Hello, I'm reading the documents on Multinomial Logistic Regress]Ethan > Xu ---04/19/2017 02:07:34 PM---Hello, I'm reading the documents on > Multinomial Logistic Regression ( > > From: Ethan Xu <ethan.yifa...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 04/19/2017 02:07 PM > Subject: Documents of SystemML Algorithms Reference > ------------------------------ > > > > Hello, > > I'm reading the documents on Multinomial Logistic Regression ( > https://apache.github.io/incubator-systemml/algorithms- > classification.html#usage) > with Scala API. It says > > val model = lr.fit(X_train_df) > val prediction = model.transform(X_test_df) > > > The "Arguments" section below it says: > > X: Location (on HDFS) to read the input matrix of feature vectors; each row > constitutes one feature vector. > > Y: Location to read the input one-column matrix of category labels that > correspond to feature vectors in X. Note the following:... > The explanation of the arguments seem to correspond to the Hadoop and Spark > API. > > Could someone please advise what are the specifications of `X_train_df` and > `X_test_df`? Are they the same as specified in the Python API? i.e.: > > # X_train, y_train and X_test can be NumPy matrices or Pandas > DataFrame or SciPy Sparse Matrixy_test = logistic.fit(X_train, > y_train).predict(X_test)# df_train is DataFrame that contains two > columns: "features" (of type Vector) and "label". df_test is a > DataFrame that contains the column "features" > > The explanation of arguments for Python/Scala seem to be missing for other > algorithms, too. > > Thanks a lot, > > Ethan > > > >