[GitHub] [arrow-datafusion] alamb commented on issue #2205: RFC: Spill-To-Disk Object Storage Download

GitBox Wed, 13 Apr 2022 05:05:07 -0700


alamb commented on issue #2205:
URL: 
https://github.com/apache/arrow-datafusion/issues/2205#issuecomment-1097971612


   > Why would you implement this in the ObjectStore API, and not some FileScan 
component generic over object stores. The caching, spilling, logic, etc... is 
not going to vary based on object store provider? An ObjectStore API that 
supports fetch requests with an optional byte range should have us covered?
   
   I was thinking that keeping things behind an ObjectStore API makes sense 
because:
   1.  the economies and performance of S3, glacier, HDFS, local Minio could be 
quite different so the amount of consolidation, number of requests, 
aggressiveness of caching, might vary by object store implementation (not sure)
   2. Some caching strategies / implementations (e.g. redis, for example)  
might not be appropriate to include in the core datafusion
   
   So in other words, binding details of caching / resource usage to DataFusion 
seemed to be unecessary
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #2205: RFC: Spill-To-Disk Object Storage Download

Reply via email to