0lai0 opened a new pull request, #1136: URL: https://github.com/apache/mahout/pull/1136
### Related Issues <!-- Closes #123 --> Close [#1125](https://github.com/apache/mahout/issues/1125) ### Changes - [x] Bug fix - [ ] New feature - [ ] Refactoring - [ ] Documentation - [ ] Test - [ ] CI/CD pipeline - [ ] Other ### Why <!-- Why is this change needed? --> In `run_throughput_pipeline`, each iteration of the warmup and timed loops called `generate_batch(...)`, which allocates a new `Vec<f64>` (`batch_size * vector_len` elements). For high-frequency throughput benchmarks, this creates significant memory allocation pressure and adds allocator overhead to the timing results, making them reflect CPU allocation time rather than pure GPU `encode_batch` throughput. ### How - Pre-allocate a single batch buffer before entering the loops - Reuse the buffer in each iteration via a new `fill_batch_inplace(...)` function that writes data in-place - Refactor `generate_batch` to delegate to `fill_batch_inplace`, eliminating code duplication ## Checklist - [x] Added or updated unit tests for all changes - [x] Added or updated documentation for all changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
