[GitHub] [arrow-datafusion] houqp commented on issue #133: Add support for reading partitioned Parquet files

GitBox Wed, 12 May 2021 12:36:17 -0700


houqp commented on issue #133:
URL: 
https://github.com/apache/arrow-datafusion/issues/133#issuecomment-840044928



   Hive partitioning is the most commonly used scheme, but there are other 
schemes as well, for example, the python arrow package supports both directory 
partitioning and hive partitioning: 
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html?highlight=partition.
   
   I agree with @Dandandan that we should add the concept of partition column 
first, then tackle how we ser/de partition values from file paths. I can see us 
going the python arrow route as well, i.e. supporting multiple partitioning 
schemes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] houqp commented on issue #133: Add support for reading partitioned Parquet files

Reply via email to