rdettai commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170848585
> That being said, I'm not really sure I agree that the object store abstraction is all that core to DataFusion. It is just an IO abstraction used at the edges of plans That's quite of a lot files that got modified for switching an "IO abstraction used at the edge of plans" 😄. I also believe that reading the data in from files is very _crucial_ to an analytics query engine. Indeed it isn't _core_ in the sense that you can do things with your engine without it (reading in memory or streaming data...), but it is still one of its main use case and more importantly, a critical performance bottleneck. And as always with optimization, you sometime need to bend the separation of concern a bit to reach your goal, which means that you will need to tweak the abstraction to get the performance you want (as you can see with topics like prefetch strategies....). And this can be made more complicated if we refer to an external store that is not owned by us. TL;DR: I would also be more comfortable with this change if we first integrated the object store abstraction into the repository. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
