thinkharderdev commented on issue #2489:
URL: 
https://github.com/apache/arrow-datafusion/issues/2489#issuecomment-1127638364

   > Yeah, buffered prefetch is one way to mitigate the small read problem. 
However, it does not allow for coalescing adjacent reads - i.e. you will still 
likely end up with one request per column chunk unless you have tiny columns.
   > 
   > TBC my preference is for 3, which mirrors the new vectored API if S3a, but 
I'm currently working on 2 first to ensure there aren't any fundamental 
integration issues.
   
   Cool. In our case the buffered prefetch helps marginally (but we also have a 
lot of sparse columns so it is a slightly special case which does a reasonable 
job at coalescing adjacent reads). 
   
   https://github.com/apache/arrow-rs/issues/1605 looks like a really good 
idea. We're also working on trying to optimize S3 reads at the moment so if 
there's any way I can help please let me know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to