Re: [I] [EPIC] Improve the performance of ListingTable [arrow-datafusion]

via GitHub Fri, 05 Apr 2024 16:05:55 -0700


MohamedAbdeen21 commented on issue #9964:
URL: 
https://github.com/apache/arrow-datafusion/issues/9964#issuecomment-2040746383


   (It's not a "performance" issue, but rather for better user experience. 
Also, it requires upstream changes to arrow-rs/object_store.)
   
   The recent change in #9912  uses 10 random files to infer the partition 
columns, this means that we may fail to catch corrupted/manually-changed 
partitions on table creation (shouldn't be a common case). This is because 
`ObjectStore` only provides `list` function to retrieve objects.
   
   If we can provide a BFS approach to traverse the Object Store and use that 
in partition inference, that will be a nice QoL change. 
   
   Curious to know how we feel about upstream changes for such non-critical 
changes in DF?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [EPIC] Improve the performance of ListingTable [arrow-datafusion]

Reply via email to