[
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951183#comment-15951183
]
ASF GitHub Bot commented on FLINK-5785:
---------------------------------------
GitHub user p4nna opened a pull request:
https://github.com/apache/flink/pull/3659
[FLINK-5785] Add an Imputer for preparing data
Adds an imputer class including tests which is able to impute values into
sparse DataSets of Vectors. One can choose if the median, the mean or the most
frequent value of a vector or row should be inserted
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/p4nna/flink imputer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3659.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3659
----
commit 88514a98642763c5ad962efecc44bef887b84110
Author: p4nna <[email protected]>
Date: 2017-03-30T08:00:33Z
Added an imputer class with Strategy class
The imputer imputes missing values into a sparse DataSet of Vectors with
different strategies which can be chosen out of the existing ones in the
strategy enum class (mean, median or most frequent value) in a row or column
commit d17c6de2ad9456a58d24ac4cda44b5ef5ce5c216
Author: p4nna <[email protected]>
Date: 2017-03-30T08:01:47Z
deleted class in false destination
commit e4b336fdbf93084c30a8ee0067efcd7a4729c0e1
Author: p4nna <[email protected]>
Date: 2017-03-30T08:02:07Z
deleted class in false destination
commit ee6d57cfa669876f983cbf10eb6ffdd02b5c3052
Author: p4nna <[email protected]>
Date: 2017-03-30T08:04:04Z
added imputer class with strategy class
the imputer impustes values into a sparse DataSet of vectors with different
strategies (mean, median or most frequent value as listed in the strategy class)
commit 57524586cbd63e2f0dfdc70cb34df82e6451c3be
Author: p4nna <[email protected]>
Date: 2017-03-30T08:04:47Z
added a test class for the new imputer class
commit 72ebd5e210f583cd7e8df21ea8d73c06e835e198
Author: p4nna <[email protected]>
Date: 2017-03-30T08:08:49Z
[FLINK-5785] Add an Imputer for preparing data,
removed unnecessary things and comments, added license
commit 31dbfc704247b0c4723d6d3091a16759fbe18041
Author: p4nna <[email protected]>
Date: 2017-03-30T08:09:26Z
[FLINK-5785] Add an Imputer for preparing data
added license
commit d0f7b816bea49090633b4bc85762bbf70b192b27
Author: p4nna <[email protected]>
Date: 2017-03-30T08:10:03Z
[FLINK-5785] Add an Imputer for preparing data
added license
commit 76f996e2ddc5d912c947f20e2109bd53973c8091
Author: p4nna <[email protected]>
Date: 2017-03-30T08:10:33Z
[FLINK-5785] Add an Imputer for preparing data
added license
commit d533805c7b37888632238ce87e73e6ef9d081d02
Author: p4nna <[email protected]>
Date: 2017-03-31T15:54:37Z
[FLINK-5785] Add an Imputer for preparing data
should work now.
commit 10dcdfab0ea27e6191cf6d0efad05a563f389ba4
Author: p4nna <[email protected]>
Date: 2017-03-31T15:56:04Z
[FLINK-5785] Add an Imputer for preparing data
was in wrong place
commit 8e67f01ba1fb707b808473f4961902542aaca369
Author: p4nna <[email protected]>
Date: 2017-03-31T15:56:21Z
[FLINK-5785] Add an Imputer for preparing data
was in wrong place
commit c3fdc87e0e9fc07785b4b4b8dc2b1fde4c756d35
Author: p4nna <[email protected]>
Date: 2017-03-31T15:56:59Z
[FLINK-5785] Add an Imputer for preparing data
should work now
commit 07507b5ca0f1cfebc38f96bb8db32c10f2186bbf
Author: p4nna <[email protected]>
Date: 2017-03-31T15:57:37Z
[FLINK-5785] Add an Imputer for preparing data
tests should work now
----
> Add an Imputer for preparing data
> ---------------------------------
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Stavros Kontopoulos
> Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values,
> either using the mean, the median or the most frequent value of the row or
> column in which the missing values are located. This class also allows for
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2.
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)