[GitHub] [arrow-datafusion] rdettai opened a new issue #1139: Implement partitioned read in listing table provider

GitBox Mon, 18 Oct 2021 03:31:21 -0700


rdettai opened a new issue #1139:
URL: https://github.com/apache/arrow-datafusion/issues/1139



   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   It is usual to organize data files by partitions. There are many ways to do 
that, but hive partitioning is the most common:
   ```
   /table_path/customer=1/year=2020/file001.parquet
   ...
   /table_path/customer=1/year=2020/file009.parquet
   /table_path/customer=2/year=2020/filexxx.parquet
   /table_path/customer=1/year=2021/filexxx.parquet
   /table_path/customer=3/year=2021/filexxx.parquet
   ```
   
   **Describe the solution you'd like**
   In the `ListingTableProvider`, when resolving the list of files:
   - their path should be parsed. The `PartitionedFile` will contain the value 
of all of the partition dimensions. 
   - files that belong to partitions that can be excluded by the filter should 
be ignored
   
   **Additional context**
   Closing https://github.com/apache/arrow-datafusion/issues/133 in favor of 
this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai opened a new issue #1139: Implement partitioned read in listing table provider

Reply via email to