XiangpengHao opened a new pull request, #8745: URL: https://github.com/apache/arrow-rs/pull/8745
This was originally found by @MikeWalrus Basically the ChunkReader for the async reader is `ColumnChunkData`: https://github.com/apache/arrow-rs/blob/2eabb595d20e691cf0c9c3ccf6a5e1b67472b07b/parquet/src/arrow/in_memory_row_group.rs#L282-L292 Which by itself is `Bytes`. The original implementation will copy the data from it and later only to make it a new `Bytes`. This PR removes it. Normally this should mean performance improvements across the board, but here're the nuances: 1. Zero-copy means we need to hold the underlying buffer longer 2. Original implementation "accidentally" (I'm not sure) gc'ed the buffer 3. To show meaningful performance difference, we need to use a proper allocator, i.e., mimalloc tldr: with mimalloc, it will always improve performance, or at least as fast as the original implementation, tested locally with `arrow_reader_clickbench` cc @tustvold and @alamb who might know this better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
