tustvold commented on issue #6946: URL: https://github.com/apache/arrow-rs/issues/6946#issuecomment-2574893357
I think this boils down to what the characteristics of the backing store are: **Memory** Data is already buffered in memory it doesn't matter what approach we use. **Object Storage** Object stores do not support true vectorized I/O, that is you can't fetch disjoint ranges in a single request. This means that to fetch 5 pages from each column, as in the intermediate buffering approach, it will actually perform one request per column. _Unless the column chunk is small in which case it may just fetch the entire column chunk, and then chuck the extra pages away, which is even worse._ These requests are made in parallel, but this does have potentially significant cost implications. I'd posit that the current approach is the right one for object stores. **Local SSD** Local SSD is the option where this gets potentially interesting, as something closer to the minimize buffering approach becomes viable. IMO this is where something like https://github.com/apache/arrow-rs/issues/5522 becomes more relevant, especially if wanting to use something like io_uring. **Alternatives** A simpler option might just be to write files with smaller row groups, this effectively "materializes" the intermediate buffering approach into the file, but with the added benefit that it doesn't break IO coalescing. This is effectively the observation made by file formats like Lance when they got rid of the column chunk. There is a slight real difference between one row group with 50 pages and 5 row groups with 10 pages each, concerning dictionary pages, but based on what I guess is motivating this request, this may be a better option -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
