yunfengzhou-hub opened a new pull request, #113:
URL: https://github.com/apache/flink-ml/pull/113

   This PR optimizes the performance of one-hot encoder algorithm with the 
following modifications:
   
   - Restructures the DAG of OneHotEncoder, so that the very first stream 
operator can pre-process input data with aggregation operations, so that the 
data transmission overhead is reduced.
   - Avoids unnecessary `String.format()` operation when passing error message 
to `Precondition.checkArgument`.
   
   These optimizations together reduces the net runtime of OneHotEncoder 
benchmark jobs to about 1/6.
   
   This PR also does the following:
   - Adds example benchmark json file for OneHotEncoder
   - Supports generating distinct double values in `DoubleGenerator`.
     - This modification has only slight influence on the performance of 
DoubleGenerator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to