[GitHub] [flink-ml] jiangxin369 commented on pull request #246: [FLINK-32545] Improves the performance by optimizing row operations

via GitHub Thu, 06 Jul 2023 23:55:57 -0700


jiangxin369 commented on PR #246:
URL: https://github.com/apache/flink-ml/pull/246#issuecomment-1624846507


   @lindong28 Thanks for the reply.
   
   > For those algorithms which have the corresponding benchmark defined in 
./flink-ml-benchmark/src/main/resources, can you run the benchmarks and 
document the performance improvements in the PR description?
   
   For those algorithms which are already defined in 
"./flink-ml-benchmark/src/main/resources", I have run the benchmark for them on 
my workstation, but most of them remain a similar performance as before. The 
reason is that the other algorithms are configured with a small dataset(1/10 of 
Bucketizer) to make it not run so long. The PR is optimized by reducing the 
cost of creating rows, the effect is positively correlated with dataset size.
   
   > Can you confirm that Bucketizer benchmark results are obtained by running 
`./flink-ml-benchmark/src/main/resources/bucketizer-benchmark.json`?
   
   Yes, it is.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-ml] jiangxin369 commented on pull request #246: [FLINK-32545] Improves the performance by optimizing row operations

Reply via email to