alamb opened a new pull request #370: URL: https://github.com/apache/arrow-datafusion/pull/370
# Which issue does this PR close? re https://github.com/apache/arrow-datafusion/issues/363 (leaving as draft until #365 is in) # Rationale for this change As explained on #363 the high level idea goal is to make the parquet row group pruning logic generic to any types of min/max statistics (not just parquet metadata) # What changes are included in this PR? 1. Changes the *output* of PruningPredicateBuilder to be a `bool` for each of the input statistics 2. Moves the parquet specific functionality (aka the function signature required for the `ParquetFileReader`) into the parquet.rs module 3. Returns errors from `build_pruning_predicate` rather than silently ignoring them (though they are still silently ignored in parquet.rs as before) 4. Improves some docstrings # Are there any user-facing changes? No change in parquet functionality is intended in this PR # Sequence: My next PR will change the *input* of the `PruningPredicateBuilder` to be generic I am trying to do this in a few small PRs to reduce review burden; Here is how I plan that they will connect together: Planned changes: - [x] Refactor code into a new module (https://github.com/apache/arrow-datafusion/pull/365) - [x] Return bool rather than parquet specific output (this PR) - [ ] Add `PruningStatstics` Trait (forthcoming PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
