Re: [PR] [python] Support query auth (row filter & column masking) for REST catalog [paimon]

via GitHub Thu, 11 Jun 2026 00:45:19 -0700


MgjLLL commented on PR #8136:
URL: https://github.com/apache/paimon/pull/8136#issuecomment-4678321413


   @JingsongLi All 3 issues fixed (+ several related correctness issues found 
during follow-up self-review). New commit `7d037c6a8` squashes the round-2 
fixes into one increment on top of `482fdadd7`. PTAL.
   
   ## Fixes for the three review comments
   
   ### 1. `catalog/table_query_auth.py` — wrapping drops `Plan.snapshot_id`
   
   `convert_plan` now returns `Plan(auth_splits, snapshot_id=plan.snapshot_id)`.
   
   Without this, query-auth tables planned from a non-empty snapshot lost 
row-id conflict / global-index update checks (`check_from_snapshot=-1`). New 
test `TestConvertPlanPreservesSnapshotId` covers both non-null and `None` cases.
   
   ### 2. `read/table_read.py` — `_create_split_read_with_read_type` bypasses 
PK widening
   
   Removed the parallel method. `_create_split_read` now accepts an optional 
`read_type`, so the auth path goes through the same `sequence.field` injection 
+ outer projection used by the normal PK read path.
   
   `id,val` projection over a PK table with `sequence.field=ts` no longer fails 
or merges by file sequence.
   
   ### 3. `common/predicate_json_parser.py` — `LIKE` escape mismatches JVM
   
   Ported `Like.sqlToRegexLike`: backslash is the default escape character, 
`\%` / `\_` are literal, invalid escape sequences raise.
   
   `LIKE admin\_%` now matches `admin_foo` (not `adminXfoo`), so policy 
predicates evaluate identically to the JVM client.
   
   ## Related correctness fixes (self-found, all on the same auth surface)
   
   - **Streaming path** previously stored `_query_auth` but never applied it. 
`StreamReadBuilder` now forwards `query_auth` and `read_type` to 
`AsyncStreamingTableScan`, which calls `_apply_auth` on initial / follow-up / 
catch-up plans. Catch-up uses `_create_initial_plan_raw` to avoid double auth.
   - **Iterator / parallel / Ray** paths consolidated through 
`_create_reader_for_split`. `to_iterator` gains a `RecordBatchReader` branch 
for PyTorch / generic iterator consumers. `scan_with_stats` applies auth for 
parity with `plan()`.
   - **Row kind preserved on PK + auth**: `RecordReaderToBatchAdapter` encodes 
`row_kind` into a `_row_kind` column when `include_row_kind=True`; 
`ColumnProjectReader` keeps `_row_kind` even when user projection drops it; 
`to_iterator` restores it via `OffsetRow.set_row_kind_byte` without leaking the 
column into `row_tuple`.
   - **Adapter chunking**: when an inner `read_batch` yields more rows than 
`chunk_size`, the surplus rows are carried over to the next flush instead of 
being dropped.
   - **`QueryAuthSplit` attribute delegation**: `__getattr__` forwards 
`file_size`, `file_paths`, `data_deletion_files`, `raw_convertible`, etc. to 
the inner split (Ray / explain / Daft paths), while guarding `_`-prefixed names 
to avoid pickle recursion.
   - **Daft `explain_scan` consistency**: `ExplainSplitInfo` now carries 
`has_auth`, populated from `QueryAuthSplit` in `_build_explain_result`. Daft's 
`_split_has_auth` accepts both real splits and the explain descriptor, so 
reported `reader_mode` and `fallback_reason="query auth active"` match the 
actual read path.
   - **Pickle safety when auth is disabled**: `table_query_auth` returns `None` 
instead of a local lambda, keeping `FileStoreTable` Ray-pickle-safe.
   
   ## Tests
   
   - New tests for snapshot_id preservation, row_kind through adapter / project 
reader, chunk-size carry-over, and `QueryAuthSplit` attribute delegation incl. 
pickle round-trip.
   - 45 query-auth related tests pass:
   
   ```
   pytest pypaimon/tests/ -k "explain or query_auth or table_query_auth"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python] Support query auth (row filter & column masking) for REST catalog [paimon]

Reply via email to