GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/772
[FLINK-2116] [ml] Reusing predict operation for evaluation
This PR adds an `evaluate` method to `Predictor` which takes a
`DataSet[Testing]` and returns a `DataSet[(LabelType, LabelType)]`, where the
first tuple field is the true label and the second field denotes the predicted
label. The evaluation logic is defined via a `EvaluateDataSetOperation`.
Since predicting test data and evaluate test data both use the same
prediction logic, a new level of abstraction was introduced. The old
`PredictOperation` is now called `PredictDataSetOperation` and a new
`PredictOperation` was defined. The `PredictOperation` takes an element of the
dataset as well as the model of the associated `Predictor` and calculates one
prediction.
If one wants to implement the predict operation of a `Predictor` then one
can do it on the level of `PredictDataSetOperation` which gives you access to
the `DataSet` of input elements or on the level of `PredictOperation`. If one
chooses the latter, then the system will automatically apply this operation to
all elements of the input `DataSet` (see
`Predictor.defaultPredictDataSetOperation`).
Having defined a `PredictOperation` allows to automatically call `evaluate`
for this `Predictor` without having to define a `EvaluateDataSetOperation`. The
only constraint is that the input data has to be `DataSet[(TestingType,
LabelType)]`. The input is thus a tuple with a testing value and the true label
value. The system will then calculate the prediction for the testing value and
return a `DataSet[(LabelType, LabelType)]` where the first field value of the
tuple is the true label value and the second field value is the predicted label
value.
What do you think of these changes? Will they ease the development of
future `Predictor`s?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink evaluatePredictor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/772.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #772
----
commit 49c02514a6a23d7ef95ce46966ff7ee7a1f407ad
Author: Till Rohrmann <[email protected]>
Date: 2015-06-02T12:34:27Z
[FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation
which can be reused by evaluate if the input data is of the format
(TestingType, LabelType) where the second tuple field represents the true label.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---