TheR1sing3un opened a new pull request, #7805:
URL: https://github.com/apache/paimon/pull/7805
## Background
Predicate pushdown in pypaimon spans three correctness-sensitive layers
(filter / file index / value stats) plus a couple of subtle invariants
around PK tables, deletion vectors, and post-merge value filtering. The
correctness obligations have so far been enforced by spot tests in a
handful of unrelated files; a regression that re-introduces a
``value_stats``-vs-``key_stats`` confusion or a per-file value filter
on PK rows would be hard to detect without targeted coverage.
## Effect
Adds a single test file that locks in the invariants as a directed
regression suite:
- **PK manifest pruning consults ``key_stats`` only** — covered by both
the value-only-predicate early-return path and the compound
(``id`` + ``val``) path, in each case using ``value_stats`` that
would drop the file if consulted.
- **DV-enabled PK tables hide L0 unconditionally** during read, so the
L0 value-stats pitfall doesn't apply.
- **Reader-level value predicate is post-merge** — writing a newer
value that fails ``< 50`` must not resurrect the older matching
value.
- **Append-only ``value_stats`` pruning** is applied and correct.
Three layers of coverage:
1. **Unit** — synthetic ``ManifestEntry`` through
``FileScanner._filter_manifest_entry`` to assert the manifest
gate's exact behaviour without relying on full I/O.
2. **Round-trip** — real catalog writes/reads with a post-merge oracle.
3. **Property** — seeded random datasets + 12 random predicate ops
(incl. ``is_null``, re-covering a historical bug where missing
``null_counts`` caused ``isNull`` to drop every file). No
``hypothesis`` dependency, keeps the Python 3.6 compatibility
contract intact.
No production code changes — this is pure regression coverage. DV
``read-mode`` (PERFORMANCE / FRESHNESS) is intentionally out of scope
because the option does not yet exist on the Python side; the
freshness-mode L0-visibility cases will land alongside that option in
a separate PR.
14 new test cases; full suite passes locally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]