TheR1sing3un opened a new pull request, #7805:
URL: https://github.com/apache/paimon/pull/7805

   ## Background
   
   Predicate pushdown in pypaimon spans three correctness-sensitive layers
   (filter / file index / value stats) plus a couple of subtle invariants
   around PK tables, deletion vectors, and post-merge value filtering. The
   correctness obligations have so far been enforced by spot tests in a
   handful of unrelated files; a regression that re-introduces a
   ``value_stats``-vs-``key_stats`` confusion or a per-file value filter
   on PK rows would be hard to detect without targeted coverage.
   
   ## Effect
   
   Adds a single test file that locks in the invariants as a directed
   regression suite:
   
   - **PK manifest pruning consults ``key_stats`` only** — covered by both
     the value-only-predicate early-return path and the compound
     (``id`` + ``val``) path, in each case using ``value_stats`` that
     would drop the file if consulted.
   - **DV-enabled PK tables hide L0 unconditionally** during read, so the
     L0 value-stats pitfall doesn't apply.
   - **Reader-level value predicate is post-merge** — writing a newer
     value that fails ``< 50`` must not resurrect the older matching
     value.
   - **Append-only ``value_stats`` pruning** is applied and correct.
   
   Three layers of coverage:
   1. **Unit** — synthetic ``ManifestEntry`` through
      ``FileScanner._filter_manifest_entry`` to assert the manifest
      gate's exact behaviour without relying on full I/O.
   2. **Round-trip** — real catalog writes/reads with a post-merge oracle.
   3. **Property** — seeded random datasets + 12 random predicate ops
      (incl. ``is_null``, re-covering a historical bug where missing
      ``null_counts`` caused ``isNull`` to drop every file). No
      ``hypothesis`` dependency, keeps the Python 3.6 compatibility
      contract intact.
   
   No production code changes — this is pure regression coverage. DV
   ``read-mode`` (PERFORMANCE / FRESHNESS) is intentionally out of scope
   because the option does not yet exist on the Python side; the
   freshness-mode L0-visibility cases will land alongside that option in
   a separate PR.
   
   14 new test cases; full suite passes locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to