wgtmac commented on PR #50158: URL: https://github.com/apache/arrow/pull/50158#issuecomment-4686798399
TBH, I don't think it is a good approach as we've tried this in the past. The main gotcha is that reading costs of different columns vary significantly by nature. For example, strings take longer time to decompress and decode but integers are smaller and faster. If the file is on a cloud object store, the majority time is blocked on waiting for I/O which may exhaust the thread pool if it is a wide column file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
