alamb commented on issue #6946: URL: https://github.com/apache/arrow-rs/issues/6946#issuecomment-2575037114
> **Local SSD** > > Local SSD is the option where this gets potentially interesting, as something closer to the minimize buffering approach becomes viable. IMO this is where something like #5522 becomes more relevant, especially if wanting to use something like io_uring. The current abstraction just starts to fall apart at this point, and I'm not sure it is sensible to try to contort it to make it work. This is a very good observation -- the usecase we have is exactly this (data is local on SSD not on remote object storage) I also think there might be value in a hybrid approach to reduce RAM requirements: fetching data from on object store to a local SSD and then reading it more incrementally > A simpler option might just be to write files with smaller row groups, this effectively "materializes" the intermediate buffering approach into the file, but with the added benefit that it doesn't break IO coalescing. This is effectively the observation made by file formats like Lance when they got rid of the column chunk. Indeed, this is exactly the workaround we are trying internally and I or @hiltontj will report back here on how well it worked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
