dubin555 opened a new pull request, #7332:
URL: https://github.com/apache/paimon/pull/7332
### Purpose
Linked issue: close #7230
`FileIndexPredicate.getRequiredNames()` calls `child.visit(this)` twice per
child in its `CompoundPredicate` visitor — once discarding the result, then
again to collect it. Since `PredicateBuilder.or()` produces right-nested binary
trees via `reduce()`, this doubles work at each tree level, resulting in O(2^n)
time complexity.
For an IN clause with 20 values (which produces a nested OR tree of depth
19), this means ~1,048,576 leaf visits instead of 20. In production, queries
with moderately sized IN clauses hang indefinitely.
The fix removes the redundant `child.visit(this)` call (line 130), matching
the correct pattern already used in `PredicateVisitor.FieldNameCollector`.
The bug was introduced in `ebdfa02bd` ("[hotfix] Correct visitors for
TransformPredicate"), which refactored the visitor to handle
`TransformPredicate` and accidentally left the duplicate call.
### Tests
- `FileIndexPredicateTest.testGetRequiredNamesLinearComplexity()` — builds a
20-element OR chain, counts leaf visits via `AtomicInteger`. Asserts exactly 20
visits (linear). Before fix: 1,048,575 visits (exponential).
- `FileIndexPredicateTest.testGetRequiredNamesPerformance()` — builds a
20-element OR chain, asserts completion within 100ms.
- `FileIndexPredicateTest.testGetRequiredNamesBasic()` — verifies
correctness: all field names are collected from a compound predicate.
- `FileIndexPredicateTest.testGetRequiredNamesSinglePredicate()` — verifies
single leaf predicate returns the correct field name.
### API and Format
No.
### Documentation
No.
### Generative AI tooling
Generated-by: Claude Code 1.0.33
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]