[GitHub] [arrow-datafusion] dispanser commented on issue #133: Add support for reading partitioned Parquet files

GitBox Tue, 27 Apr 2021 12:18:15 -0700


dispanser commented on issue #133:
URL: 
https://github.com/apache/arrow-datafusion/issues/133#issuecomment-827854576



   Is there any reason to limit this to parquet files? In spark, this 
functionality is shared between csv, json, orc and parquet.
   
   Maybe the implementation could target the shared file listing in 
`physical_plan::common::build_file_list()` which seems to be shared between 
parquet and csv.
   
   Considering #204 (adding partition pruning), it may be sensibel to already 
implement the partition pruning logic early in the file listing procedure 
itself, as it could save on file listing operations, which tend to be expensive 
in particular on cloud storage (EBS).
   
   I'd love to work on this, but I'd need a bit of guidance on the preferred 
approach.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] dispanser commented on issue #133: Add support for reading partitioned Parquet files

Reply via email to