MgjLLL commented on PR #8136: URL: https://github.com/apache/paimon/pull/8136#issuecomment-4678321413
@JingsongLi All 3 issues fixed (+ several related correctness issues found during follow-up self-review). New commit `7d037c6a8` squashes the round-2 fixes into one increment on top of `482fdadd7`. PTAL. ## Fixes for the three review comments ### 1. `catalog/table_query_auth.py` — wrapping drops `Plan.snapshot_id` `convert_plan` now returns `Plan(auth_splits, snapshot_id=plan.snapshot_id)`. Without this, query-auth tables planned from a non-empty snapshot lost row-id conflict / global-index update checks (`check_from_snapshot=-1`). New test `TestConvertPlanPreservesSnapshotId` covers both non-null and `None` cases. ### 2. `read/table_read.py` — `_create_split_read_with_read_type` bypasses PK widening Removed the parallel method. `_create_split_read` now accepts an optional `read_type`, so the auth path goes through the same `sequence.field` injection + outer projection used by the normal PK read path. `id,val` projection over a PK table with `sequence.field=ts` no longer fails or merges by file sequence. ### 3. `common/predicate_json_parser.py` — `LIKE` escape mismatches JVM Ported `Like.sqlToRegexLike`: backslash is the default escape character, `\%` / `\_` are literal, invalid escape sequences raise. `LIKE admin\_%` now matches `admin_foo` (not `adminXfoo`), so policy predicates evaluate identically to the JVM client. ## Related correctness fixes (self-found, all on the same auth surface) - **Streaming path** previously stored `_query_auth` but never applied it. `StreamReadBuilder` now forwards `query_auth` and `read_type` to `AsyncStreamingTableScan`, which calls `_apply_auth` on initial / follow-up / catch-up plans. Catch-up uses `_create_initial_plan_raw` to avoid double auth. - **Iterator / parallel / Ray** paths consolidated through `_create_reader_for_split`. `to_iterator` gains a `RecordBatchReader` branch for PyTorch / generic iterator consumers. `scan_with_stats` applies auth for parity with `plan()`. - **Row kind preserved on PK + auth**: `RecordReaderToBatchAdapter` encodes `row_kind` into a `_row_kind` column when `include_row_kind=True`; `ColumnProjectReader` keeps `_row_kind` even when user projection drops it; `to_iterator` restores it via `OffsetRow.set_row_kind_byte` without leaking the column into `row_tuple`. - **Adapter chunking**: when an inner `read_batch` yields more rows than `chunk_size`, the surplus rows are carried over to the next flush instead of being dropped. - **`QueryAuthSplit` attribute delegation**: `__getattr__` forwards `file_size`, `file_paths`, `data_deletion_files`, `raw_convertible`, etc. to the inner split (Ray / explain / Daft paths), while guarding `_`-prefixed names to avoid pickle recursion. - **Daft `explain_scan` consistency**: `ExplainSplitInfo` now carries `has_auth`, populated from `QueryAuthSplit` in `_build_explain_result`. Daft's `_split_has_auth` accepts both real splits and the explain descriptor, so reported `reader_mode` and `fallback_reason="query auth active"` match the actual read path. - **Pickle safety when auth is disabled**: `table_query_auth` returns `None` instead of a local lambda, keeping `FileStoreTable` Ray-pickle-safe. ## Tests - New tests for snapshot_id preservation, row_kind through adapter / project reader, chunk-size carry-over, and `QueryAuthSplit` attribute delegation incl. pickle round-trip. - 45 query-auth related tests pass: ``` pytest pypaimon/tests/ -k "explain or query_auth or table_query_auth" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
