fatemehp commented on PR #17877: URL: https://github.com/apache/arrow/pull/17877#issuecomment-1405674039
Updating the benchmark numbers. The 10X improvement above is not accurate, it was due to a bug in the code for reading optional fields. For the benchmark data, the improvement for optional fields is more like 10%, and for repeated fields 40%. Nevertheless, we need this API since some readers may want to read dense. ``` -------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------- ColumnReaderSkipInt32/Repetition:0/BatchSize:1000 234597 ns 233231 ns 2529 bytes_per_second=20.4448G/s ColumnReaderSkipInt32/Repetition:1/BatchSize:1000 1613926 ns 1608305 ns 442 bytes_per_second=1.57848G/s ColumnReaderSkipInt32/Repetition:2/BatchSize:1000 2469379 ns 2462899 ns 280 bytes_per_second=1119.15M/s ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000 257132 ns 256459 ns 2457 bytes_per_second=18.5931G/s ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000 1629986 ns 1625472 ns 442 bytes_per_second=1.56181G/s ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000 2530858 ns 2514407 ns 282 bytes_per_second=1096.22M/s RecordReaderSkipRecords/Repetition:0/BatchSize:1000 267614 ns 267870 ns 2459 bytes_per_second=17.801G/s RecordReaderSkipRecords/Repetition:1/BatchSize:1000 1615133 ns 1613831 ns 443 bytes_per_second=1.57308G/s RecordReaderSkipRecords/Repetition:2/BatchSize:1000 11080021 ns 11021551 ns 66 bytes_per_second=250.087M/s RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:1 312089 ns 312874 ns 2143 bytes_per_second=15.2406G/s RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:0 320200 ns 320023 ns 2128 bytes_per_second=14.9001G/s RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:1 7913189 ns 7885560 ns 89 bytes_per_second=329.667M/s RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:0 8907880 ns 8879646 ns 77 bytes_per_second=292.76M/s RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:1 11947824 ns 11942741 ns 54 bytes_per_second=230.797M/s RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:0 20702187 ns 20626693 ns 32 bytes_per_second=133.63M/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
