westonpace commented on PR #34281: URL: https://github.com/apache/arrow/pull/34281#issuecomment-1452808447
> Maybe it's better to define default RowGroup length for different FS or storage backend. Agreed. > By the way, it's funny that, in arrow's io-merging in S3, the default range limit is 32M :) AWS [suggests](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-guidelines.html) 8MB or 16MB. 32MB is probably fine. It's still possible to get 32MiB reads. For example, when reading multiple columns at a time. My own testing with HDD shows that ~4MB is good enough to get sequential reads. 1Mi rows, without any encodings, will yield ~4-8MiB for a column (assuming 4 or 8 bytes per value) so in many cases I think the reads will still be large enough. > Maybe it's better to define default RowGroup length for different FS or storage backend. That is probably a good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
