zhipeng93 opened a new pull request, #222:
URL: https://github.com/apache/flink-ml/pull/222
## What is the purpose of the change
This PR aims to fix the bug when using quantile strategy in
KbinsDiscretizer.
- In the original version, we remove bins with zero width by simply merge it
with the next bin. However it is not always true. For example, if the
histograms computed are [0, 0, 0, 1], then it is transformed into [0, 1]. Thus
is 0 and 1 are mapped in to the same bin, which is wrong.
- In this PR, the histogram will be transformed into [0, 0.5, 1]. It is
computed with the following logic:
- Remove the repeated elements in the histgram such that there are no
three consecutive same elements.
- If there two consecutive same elements, the second one is computed as
the average of the element before and after.
- If the last two elements in the histogram are the same, the last one
is removed.
## Brief change log
- Changed the computing logic of handling zero-width bins.
- Added unit test to verify it.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]