QuakeWang opened a new pull request, #7965:
URL: https://github.com/apache/paimon/pull/7965
### Purpose
The Daft Paimon datasource used the configured read builder for scan
planning, but fallback split tasks rebuilt a bare `TableRead`. As a result,
fallback reads for PK merge, non-Parquet formats, BLOB columns, and
deletion-vector paths could miss pushed predicate/projection/limit state.
This was correctness-sensitive because Daft may treat pushed filters as
already handled by the source. A query filtering a fallback table could plan
the right split but still emit unfiltered rows from that split. There was also
a related limit-ordering issue: applying `limit` inside the fallback reader
while Daft still had remaining row or partition filters could truncate rows
before those filters were evaluated.
This patch makes fallback tasks use a configured `TableRead`, keeps the
required filter columns available for fallback execution, and centralizes the
source-side limit decision so `limit` is only pushed when it is safe for the
source to apply it before returning rows.
### Tests
```shell
python -m py_compile \
paimon-python/pypaimon/daft/daft_datasource.py \
paimon-python/pypaimon/tests/daft/daft_data_test.py
python -m pytest paimon-python/pypaimon/tests/daft/daft_data_test.py -q
python -m pytest \
paimon-python/pypaimon/tests/daft/daft_sink_test.py::TestBlobType::test_write_read_blob_type
-q
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]