MohamedAbdeen21 commented on issue #9964: URL: https://github.com/apache/arrow-datafusion/issues/9964#issuecomment-2040746383
(It's not a "performance" issue, but rather for better user experience. Also, it requires upstream changes to arrow-rs/object_store.) The recent change in #9912 uses 10 random files to infer the partition columns, this means that we may fail to catch corrupted/manually-changed partitions on table creation (shouldn't be a common case). This is because `ObjectStore` only provides `list` function to retrieve objects. If we can provide a BFS approach to traverse the Object Store and use that in partition inference, that will be a nice QoL change. Curious to know how we feel about upstream changes for such non-critical changes in DF? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
