GitHub user thvasilo opened a pull request:
https://github.com/apache/flink/pull/871
[FLINK-2157] [ml] [WIP] Create evaluation framework for ML library
WIP PR for the model evaluation framework for FlinkML.
The evaluation follow sklearn's paradigm, where a Scorer object is created
with a performance score (sklearn's metrics), and provides an evaluate function
that takes a trained model and a test dataset and produces a score.
The performance scores and Scorer are implemented in the
flink.ml.evaluation package.
Currently we have squared loss, zero-one loss, accuracy score for
classification and R^2 score for
regression.
Finally a score function has been added to regression algorithms (and will
be added to classifiers as well) that provides an intuitive way to evaluate the
performance of an algorithm without the need to create a Scorer, as per
[FLINK-2108](https://issues.apache.org/jira/browse/FLINK-2108).
The PR currently includes some work from Mikio Braun for a linear
regression solver, but that will be moved to a separate PR.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/thvasilo/flink evaluation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/871.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #871
----
commit ac373fb4af39d288c5b61bf1c86b1de5556748a6
Author: Till Rohrmann <[email protected]>
Date: 2015-06-02T12:34:27Z
[FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation
which can be reused by evaluate if the input data is of the format
(TestingType, LabelType) where the second tuple field represents the true label.
commit 7133cafb643d545fa5c66bedc7d5eda847faeb62
Author: mikiobraun <[email protected]>
Date: 2015-06-09T11:25:34Z
First working version of a simpler least squares implementation
Not done any work integrating that with the Flink Pipeline stuff
commit f5315c0ce59b6a32c8aeb81ebba2a5982e981835
Author: mikiobraun <[email protected]>
Date: 2015-06-10T08:49:55Z
reduce amount of toString computations for large collections
commit 74aafa00e7e61003e081f9b54697ee9904487544
Author: mikiobraun <[email protected]>
Date: 2015-06-12T15:18:39Z
simple lsr into pipeline
commit f5c498ba1ba58a51f265f922fdce312518fcbf68
Author: mikiobraun <[email protected]>
Date: 2015-06-19T11:23:53Z
working on the Simple LSR tests
commit f37c41fc1d0b959c60c3e06f7d4633b57a7b87ac
Author: mikiobraun <[email protected]>
Date: 2015-06-19T14:32:54Z
slightly better checks in the SimpleLeastSquaresRegressionTest
commit aae27c2f25792143febb900a11f4980ca1159aae
Author: mikiobraun <[email protected]>
Date: 2015-06-22T15:04:42Z
Adding some first loss functions for the evaluation framework
commit 4d115f7db3e569655e2fb156f18ec897cd573089
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-06-23T14:07:48Z
Scorer for evaluation
commit 1e7309d7ba2519e2520ed816456cfa2ca8e92510
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-06-25T09:41:10Z
Adds accuracy score and R^2 score. Also trying out Scores as classes
instead of functions.
Not too happy with the extra biolerplate of Score as classes will probably
revert,
and have objects like RegressionsScores, ClassificationScores that contain
the definitions
of the relevant scores.
commit 3e275d567e2c4fe0b72875cfb54645dd346b4e22
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-06-26T11:30:56Z
Adds a evaluate operation for LabeledVector input
commit 8c194be4a39170cb7f4865ae1dd39ebbeeddef7e
Author: Theodore Vasiloudis <[email protected]>
Date: 2015-06-26T11:32:13Z
Adds Regressor interface, and a score function for regression algorithms.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---