[GitHub] [arrow-datafusion] ozgrakkurt opened a new issue, #3141: Use existing parquet column as a partition column in ListingTable

GitBox Mon, 15 Aug 2022 00:42:52 -0700


ozgrakkurt opened a new issue, #3141:
URL: https://github.com/apache/arrow-datafusion/issues/3141


   I have a big number of parquet files and they are partitioned by a column in 
their schema. Currently if I run a query by this column it seems like all of 
the files are checked. But if listing table keps statistics of this column per 
file and pruned the file list when running the query it would open a single 
file and a single row group inside that file. This would dramatically increase 
the performance.
   
   I am aware a similar thing can be achieved by `table_partition_cols` config 
on `ListingOptions` but this feature would be much easier to use (for me at 
least).
   
   Would this make sense to implement? if yes I can work on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozgrakkurt opened a new issue, #3141: Use existing parquet column as a partition column in ListingTable

Reply via email to