alamb opened a new pull request, #7537:
URL: https://github.com/apache/arrow-rs/pull/7537
Draft until
- [ ] Change from RowSelection to an `enum`
- [ ] Write unit tests
# Which issue does this PR close?
This is a step towards implementing Adaptive parquet filter selections:
- #5523
# Rationale for this change
Part of the idea of adaptive decoding is the need to have different read
strategies based on the patterns of rows selected
The current code mixes
1. The determination of the exact read/skip pattern
3. The actual decoding of the rows.
This makes it hard to add additional complexity to determining the read/skip
pattern, for example @zhuqi-lucas had to put Bitmap selection the logic in the
middle of the decoder here:
- https://github.com/apache/arrow-rs/pull/7524
Similarly to the way the `filter` kernel decides up front how to scan, I
think we should also change the parquet reader to determine what to do up front
and then just do it during decode.
Splitting the planning from the execution also gives us a place to generate
(and unit test) various heuristics for the plan
Change
1. Move the calculation of when to read/emit rows into ReadPlan construction
2. Decode simply
There is no change in behavior intended -- the selection evaluation is not
yet adaptive. This is meant to be a pure refactoring. I have added tests / test
framework to make it easier to make this adaptive in the future
# What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it is
sometimes worth providing a summary of the individual changes in this PR.
-->
# Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please call them out.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]