MgjLLL commented on PR #8136:
URL: https://github.com/apache/paimon/pull/8136#issuecomment-4648661464

     Fixes for issues raised by @JingsongLi, plus one additional issue found 
during analysis.
     8 files changed, +188 -13 lines.
   
     #### Fix 1: Ray read path bypasses auth (`split_provider.py`)
   
     `_ensure_planned()` directly constructed `ReadBuilder(table)`, bypassing 
the `query_auth` injection from `FileStoreTable.new_read_builder()`.
   
     **Fix:** Changed to `self._ensure_table().new_read_builder()`. Removed the 
direct `ReadBuilder` import.
   
     #### Fix 2: Streaming read path skips auth entirely 
(`streaming_table_scan.py`, `stream_read_builder.py`)
   
     `StreamReadBuilder` stored `_query_auth` but never passed it to 
`AsyncStreamingTableScan`. All three plan creation methods 
(initial/follow-up/catch-up) produced plans without auth wrapping.
   
     **Fix:**
     - Added `query_auth` parameter to `AsyncStreamingTableScan.__init__()`
     - Added `_apply_auth(plan)` method that calls `query_auth(select)` → 
`convert_plan(plan)`
     - Applied auth to `_create_initial_plan`, `_create_follow_up_plan`, 
`_create_catch_up_plan`
     - Extracted `_create_initial_plan_raw()` to avoid double auth in catch-up 
path
     - `StreamReadBuilder.new_streaming_scan()` now passes `query_auth`
   
     #### Fix 3: Primary-key table merge reader incompatible with auth wrappers 
(`auth_masking_reader.py`, `table_read.py`)
   
     `MergeFileSplitRead.create_reader()` returns a `RecordReader` (row-level), 
but `AuthFilterReader`/`AuthMaskingReader` require `RecordBatchReader` 
(batch-level with `read_arrow_batch()`). Java has a unified 
`RecordReader<InternalRow>` so this
     mismatch doesn't exist in JVM.
   
     **Fix:**
     - Added `RecordReaderToBatchAdapter(RecordBatchReader)` that collects 
`OffsetRow` tuples from `read_batch()`/`next()` and converts them to 
`pa.RecordBatch`
     - `_authed_reader()` now detects non-`RecordBatchReader` and wraps with 
the adapter before applying auth wrappers
   
     #### Fix 4: Parallel read path bypasses auth + missing `raw_convertible` 
proxy (`table_read.py`, `query_auth_split.py`)
   
     `_read_one_split_to_batches()` called 
`_create_split_read(split).create_reader()` directly, bypassing 
`QueryAuthSplit` detection. Additionally, `QueryAuthSplit` lacked a 
`raw_convertible` property proxy, causing `AttributeError` when accessed.
   
     **Fix:**
     - Changed to `self._create_reader_for_split(split)` which correctly 
handles `QueryAuthSplit`
     - Added `raw_convertible` property proxy to `QueryAuthSplit`
   
     ### New Tests (for fixes)
   
     - `TestQueryAuthSplitRawConvertible` (2 tests) — verifies 
`raw_convertible` proxy for true/false
     - `TestRecordReaderToBatchAdapter` (5 tests) — basic conversion, 
multi-batch, empty reader, close delegation, integration with `AuthFilterReader`
   
     All 90 auth-related tests pass. Full test suite: 318/324 pass (6 
pre-existing failures unrelated to this PR).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to