[PR] [QDP] Optimize run_throughput_pipeline to avoid per-iteration Vec allocations [mahout]

via GitHub Fri, 06 Mar 2026 09:09:51 -0800


0lai0 opened a new pull request, #1136:
URL: https://github.com/apache/mahout/pull/1136


   ### Related Issues
   
   <!-- Closes #123 -->
   Close [#1125](https://github.com/apache/mahout/issues/1125) 
   
   ### Changes
   
   - [x] Bug fix
   - [ ] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Why
   
   <!-- Why is this change needed? -->
   In `run_throughput_pipeline`, each iteration of the warmup and timed loops 
called `generate_batch(...)`, which allocates a new `Vec<f64>` (`batch_size * 
vector_len` elements). For high-frequency throughput benchmarks, this creates 
significant memory allocation pressure and adds allocator overhead to the 
timing results, making them reflect CPU allocation time rather than pure GPU 
`encode_batch` throughput.
   
   ### How
   
   - Pre-allocate a single batch buffer before entering the loops
   - Reuse the buffer in each iteration via a new `fill_batch_inplace(...)` 
function that writes data in-place
   - Refactor `generate_batch` to delegate to `fill_batch_inplace`, eliminating 
code duplication
   
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [QDP] Optimize run_throughput_pipeline to avoid per-iteration Vec allocations [mahout]

Reply via email to