alamb commented on issue #6946:
URL: https://github.com/apache/arrow-rs/issues/6946#issuecomment-2575037114

   > **Local SSD**
   > 
   > Local SSD is the option where this gets potentially interesting, as 
something closer to the minimize buffering approach becomes viable. IMO this is 
where something like #5522 becomes more relevant, especially if wanting to use 
something like io_uring. The current abstraction just starts to fall apart at 
this point, and I'm not sure it is sensible to try to contort it to make it 
work.
   
   This is a very good observation -- the usecase we have is exactly this (data 
is local on SSD not on remote object storage)
   
   I also think there might be value in a hybrid approach to reduce RAM 
requirements:  fetching data from on object store to a local SSD and then 
reading it more incrementally 
   
   > A simpler option might just be to write files with smaller row groups, 
this effectively "materializes" the intermediate buffering approach into the 
file, but with the added benefit that it doesn't break IO coalescing. This is 
effectively the observation made by file formats like Lance when they got rid 
of the column chunk.
   
   Indeed, this is exactly the workaround we are trying internally and I or 
@hiltontj  will report back here on how well it worked. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to