GaneshPatil7517 opened a new pull request, #19848:
URL: https://github.com/apache/datafusion/pull/19848
## Summary
Implements Phase 1 infrastructure for reverse page ordering in Parquet sort
pushdown optimization, addressing issue #19486. This foundation establishes the
flag infrastructure necessary for future page-level reversal implementation.
## What's Changed
- Added `reverse_pages` flag to `ParquetSource` struct with getter/setter
methods
- Added `reverse_pages` field to `ParquetOpener` struct via builder pattern
- Extended `try_reverse_output()` to set both `reverse_row_groups` and
`reverse_pages` flags when optimizing descending sorts
- Wired flag propagation through the existing FileSource → ParquetOpener →
ParquetSource call chain
- Updated display formatting to show `reverse_pages` when enabled
## Architecture
This implementation follows the established pattern of `reverse_row_groups`:
- Infrastructure flag is added to both source and opener structs
- Flag is set via builder pattern for clean API design
- Propagation through `try_reverse_output()` ensures coordination with row
group reversal
## Testing
- ✅ All 27 existing reverse-related tests pass
- ✅ Added 4 new comprehensive tests for `reverse_pages` functionality:
- `test_reverse_pages_default_value` - Verifies default is false
- `test_reverse_pages_with_setter` - Verifies setter works correctly
- `test_reverse_pages_clone_preserves_value` - Ensures cloning preserves
state
- `test_reverse_pages_independent_of_reverse_row_groups` - Confirms
independent flag operation
- ✅ No regressions
- ✅ Code quality verified:
- `cargo fmt` - properly formatted
- `cargo clippy -D warnings` - no warnings
## Phase 1 Design Rationale
This Phase 1 implementation establishes infrastructure for future page-level
reversal. Actual page reversal implementation is deferred to Phase 2 because:
- Arrow-rs `ParquetRecordBatchStreamBuilder` currently lacks public APIs for
page-level reversal
- Materializing all pages in memory for reversal would have significant
performance implications
- Separating infrastructure (Phase 1) from implementation (Phase 2) enables
parallel development
Phase 2 can implement actual page reversal once arrow-rs provides necessary
page-level APIs or alternative approaches are available.
## Files Modified
- `datafusion/datasource-parquet/src/source.rs` - Added reverse_pages field
and methods
- `datafusion/datasource-parquet/src/opener.rs` - Added reverse_pages field
to builder
- Added comprehensive test coverage
## Related Issues
Fix #19486
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]