[GitHub] [arrow-rs] rdettai commented on issue #1163: Use Standard Library IO Abstractions in Parquet

GitBox Sun, 13 Mar 2022 13:48:03 -0700


rdettai commented on issue #1163:
URL: https://github.com/apache/arrow-rs/issues/1163#issuecomment-1066179351



   Hi @tustvold ! I am the weird mind who introduced the ChunkReader 😄. 
   
   The main benefit compared to a regular reader is that you can specify the 
size of the chunk you plan to read, which enables you to set the right range 
when you call GET on the object store. Not sure how we could achieve this with 
plain `std::io::Read`. But I ended up offloading the download scheduling to a 
separate module anyway, and I think that this is what you will want to do in 
most cases to optimize your link with the object store (this is also what is 
done in IOx, isn't it?).
   
   Another initial goal was indeed to try to achieve parallelism between 
columns, but I never succeeded because the entire structure of the parquet 
reader was against it, and I didn't have enough Rust experience to fight it 😉.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] rdettai commented on issue #1163: Use Standard Library IO Abstractions in Parquet

Reply via email to