We have kept that private because we need to decide on a name for the
method which evaluates on a test set (see the TODO comment
<https://github.com/apache/spark/pull/7099/files#diff-668c79317c51f40df870d3404d8a731fR272>);
perhaps you could push for this to happen by creating a Jira and pinging
jkbradley and mengxr. Thanks!

On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren <inv...@gmail.com> wrote:

> Working on spark.ml.classification.LogisticRegression.scala (spark 1.5),
>
> It might be useful if we can create a summary for any given dataset, not
> just training set.
> Actually, BinaryLogisticRegressionTrainingSummary  is only created when
> model is computed based on training set.
> As usual, we need to summary test set to know about the model performance.
> However, we can not create our own BinaryLogisticRegressionSummary for
> other date set (of type DataFrame), because the Summary class is "private"
> in classification package.
>
> Would it be better to remove the "private" access modifier and allow the
> following code on user side:
>
> val lr = new LogisticRegression()
>
> val model = lr.fit(trainingSet)
>
> val binarySummary =
>   new BinaryLogisticRegressionSummary(
>     model.transform(testSet),
>     lr.probabilityCol,
>     lr.labelCol
>   )
>
> binarySummary.roc
>
>
> Thus, we can use the model to summary any data set we want.
>
> If there is a way to summary test set, please let me know. I have browsed
> LogisticRegression.scala, but failed to find one.
>
> Thx.
>
> --
> Hao Ren
>
> Data Engineer @ leboncoin
>
> Paris, France
>

Reply via email to