Thank you for the reply. I have created a jira issue and pinged mengxr.
Here is the link: https://issues.apache.org/jira/browse/SPARK-10691 I did not find jkbradley on jira. I saw he is on github. BTW, should I create a pull request on removing the private modifier for further discussion ? Thx. On Thu, Sep 17, 2015 at 6:44 PM, Feynman Liang <fli...@databricks.com> wrote: > We have kept that private because we need to decide on a name for the > method which evaluates on a test set (see the TODO comment > <https://github.com/apache/spark/pull/7099/files#diff-668c79317c51f40df870d3404d8a731fR272>); > perhaps you could push for this to happen by creating a Jira and pinging > jkbradley and mengxr. Thanks! > > On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren <inv...@gmail.com> wrote: > >> Working on spark.ml.classification.LogisticRegression.scala (spark 1.5), >> >> It might be useful if we can create a summary for any given dataset, not >> just training set. >> Actually, BinaryLogisticRegressionTrainingSummary is only created when >> model is computed based on training set. >> As usual, we need to summary test set to know about the model performance. >> However, we can not create our own BinaryLogisticRegressionSummary for >> other date set (of type DataFrame), because the Summary class is "private" >> in classification package. >> >> Would it be better to remove the "private" access modifier and allow the >> following code on user side: >> >> val lr = new LogisticRegression() >> >> val model = lr.fit(trainingSet) >> >> val binarySummary = >> new BinaryLogisticRegressionSummary( >> model.transform(testSet), >> lr.probabilityCol, >> lr.labelCol >> ) >> >> binarySummary.roc >> >> >> Thus, we can use the model to summary any data set we want. >> >> If there is a way to summary test set, please let me know. I have browsed >> LogisticRegression.scala, but failed to find one. >> >> Thx. >> >> -- >> Hao Ren >> >> Data Engineer @ leboncoin >> >> Paris, France >> > > -- Hao Ren Data Engineer @ leboncoin Paris, France