yunfengzhou-hub opened a new pull request, #113:
URL: https://github.com/apache/flink-ml/pull/113
This PR optimizes the performance of one-hot encoder algorithm with the
following modifications:
- Restructures the DAG of OneHotEncoder, so that the very first stream
operator can pre-process input data with aggregation operations, so that the
data transmission overhead is reduced.
- Avoids unnecessary `String.format()` operation when passing error message
to `Precondition.checkArgument`.
These optimizations together reduces the net runtime of OneHotEncoder
benchmark jobs to about 1/6.
This PR also does the following:
- Adds example benchmark json file for OneHotEncoder
- Supports generating distinct double values in `DoubleGenerator`.
- This modification has only slight influence on the performance of
DoubleGenerator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]