MgjLLL commented on PR #8136:
URL: https://github.com/apache/paimon/pull/8136#issuecomment-4648668415
> I found a few correctness issues in the query-auth paths introduced here:
>
> 1. `paimon-python/pypaimon/read/datasource/split_provider.py:127`
constructs `ReadBuilder(self._ensure_table())` directly. That bypasses
`FileStoreTable.new_read_builder()`, which is where the REST query auth is
injected. As a result, `pypaimon.ray.read_paimon(...)` can read REST tables
without applying server-side row filters or column masking. I think this should
either call `self._ensure_table().new_read_builder()` or explicitly pass the
table's query auth into the builder, with a Ray regression test for row
filtering/masking.
> 2. `paimon-python/pypaimon/read/stream_read_builder.py:117` stores
`_query_auth`, but `new_streaming_scan()` does not pass it into
`AsyncStreamingTableScan`. The plans returned from
`streaming_table_scan.py:322` and `streaming_table_scan.py:386` also do not go
through `auth_result.convert_plan()`. So `table.new_stream_read_builder()`
skips row filters and column masking for both the initial scan and later
delta/changelog scans. The streaming scan should preserve and apply query auth
before returning each plan.
> 3. The auth reader wrappers currently assume the inner reader supports
`read_arrow_batch()`
(`paimon-python/pypaimon/read/reader/auth_masking_reader.py:38` and `:66`). For
primary-key tables with non-raw-convertible splits, `TableRead` can create a
`MergeFileSplitRead`, whose `create_reader()` returns the normal row
`RecordReader` path rather than a `RecordBatchReader`. Wrapping that in
`AuthFilterReader`/`AuthMaskingReader` will fail with `AttributeError` when
query auth is enabled. This needs either a row-reader auth path, conversion to
a batch-capable reader before wrapping, or routing/rejecting these splits
explicitly.
@JingsongLi All 3 issues fixed (+ 1 additional parallel path bypass found
during analysis). See updated PR description. PTAL.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]