ozgrakkurt opened a new issue, #3141: URL: https://github.com/apache/arrow-datafusion/issues/3141
I have a big number of parquet files and they are partitioned by a column in their schema. Currently if I run a query by this column it seems like all of the files are checked. But if listing table keps statistics of this column per file and pruned the file list when running the query it would open a single file and a single row group inside that file. This would dramatically increase the performance. I am aware a similar thing can be achieved by `table_partition_cols` config on `ListingOptions` but this feature would be much easier to use (for me at least). Would this make sense to implement? if yes I can work on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
