jp0317 opened a new pull request, #39818: URL: https://github.com/apache/arrow/pull/39818
### Rationale for this change Currently each invocation of SkipRecords() for non-repeated fields will [create a new buffer](https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc#L1482) object[1]. I think it probably worth keep the buffer object alive and just resize it for each skip, as the buffer is just a bitmap (i.e., should remain quite small even we don't free its memory after skip). Performance results are as follows: Keep buffer object alive: -------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------- RecordReaderSkipRecords/Repetition:0/BatchSize:1 29958201 ns 29943377 ns 23 bytes_per_second=163.068M/s RecordReaderSkipRecords/Repetition:0/BatchSize:10 3190298 ns 3190524 ns 227 bytes_per_second=1.49454G/s RecordReaderSkipRecords/Repetition:0/BatchSize:100 479056 ns 480437 ns 1500 bytes_per_second=9.92507G/s RecordReaderSkipRecords/Repetition:0/BatchSize:1000 256497 ns 257725 ns 2763 bytes_per_second=18.5018G/s RecordReaderSkipRecords/Repetition:1/BatchSize:1000 2910364 ns 2910479 ns 239 bytes_per_second=893.189M/s RecordReaderSkipRecords/Repetition:2/BatchSize:1000 20539472 ns 20535632 ns 34 bytes_per_second=135.007M/s Recreate upon each skip (current behavior): -------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------- RecordReaderSkipRecords/Repetition:0/BatchSize:1 33261760 ns 33199124 ns 21 bytes_per_second=147.077M/s RecordReaderSkipRecords/Repetition:0/BatchSize:10 3256993 ns 3254609 ns 216 bytes_per_second=1.46511G/s RecordReaderSkipRecords/Repetition:0/BatchSize:100 492856 ns 493377 ns 1447 bytes_per_second=9.66477G/s RecordReaderSkipRecords/Repetition:0/BatchSize:1000 262449 ns 263227 ns 2694 bytes_per_second=18.1151G/s RecordReaderSkipRecords/Repetition:1/BatchSize:1000 2996951 ns 2997148 ns 235 bytes_per_second=867.36M/s RecordReaderSkipRecords/Repetition:2/BatchSize:1000 20864734 ns 20850593 ns 34 bytes_per_second=132.968M/s ### What changes are included in this PR? change a stack buffer object to a private member in recordreader. ### Are these changes tested? microbenchmarks ### Are there any user-facing changes? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
