[GitHub] [arrow-datafusion] rdettai commented on pull request #2677: Switch to object_store crate (#2489)

GitBox Thu, 30 Jun 2022 00:07:34 -0700


rdettai commented on PR #2677:
URL: 
https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170848585


   > That being said, I'm not really sure I agree that the object store 
abstraction is all that core to DataFusion. It is just an IO abstraction used 
at the edges of plans
   
   That's quite of a lot files that got modified for switching an "IO 
abstraction used at the edge of plans" 😄. I also believe that reading the data 
in from files is very _crucial_ to an analytics query engine. Indeed it isn't 
_core_ in the sense that you can do things with your engine without it (reading 
in memory or streaming data...), but it is still one of its main use case and 
more importantly, a critical performance bottleneck. And as always with 
optimization, you sometime need to bend the separation of concern a bit to 
reach your goal, which means that you will need to tweak the abstraction to get 
the performance you want (as you can see with topics like prefetch 
strategies....). And this can be made more complicated if we refer to an 
external store that is not owned by us.
   
   TL;DR: I would also be more comfortable with this change if we first 
integrated the object store abstraction into the repository.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai commented on pull request #2677: Switch to object_store crate (#2489)

Reply via email to