[GitHub] [arrow] pitrou commented on pull request #34323: GH-34322: [C++][Parquet] Encoding Microbench for ByteArray

via GitHub Tue, 28 Feb 2023 02:14:50 -0800


pitrou commented on PR #34323:
URL: https://github.com/apache/arrow/pull/34323#issuecomment-1447914280


   A couple general comments:
   * we should avoid combinatorial explosion of benchmark variations; the 
longer benchmarks take to run, the rarer it is to run them; you could easily 
reduce the number of "max-string-length" values for example
   * we should test only meaningful or reasonable parameters; if "batch-size:8" 
means process 8 items at a time, I think the easy answer is "don't do it"; 
Parquet encoding/decoding need batch sizes in the hundreds or thousands to be 
efficient
   * the memory footprint of each benchmark should be similar and reasonable; 
if "byte_array_bytes" is the memory footprint then this must be fixed; most of 
the time the footprint should not be larger than ~10 MB
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on pull request #34323: GH-34322: [C++][Parquet] Encoding Microbench for ByteArray

Reply via email to