[PR] [python] Fix Daft fallback filter limit pushdown [paimon]

via GitHub Mon, 25 May 2026 19:38:46 -0700


QuakeWang opened a new pull request, #7965:
URL: https://github.com/apache/paimon/pull/7965


   ### Purpose
   
   The Daft Paimon datasource used the configured read builder for scan 
planning, but fallback split tasks rebuilt a bare `TableRead`. As a result, 
fallback reads for PK merge, non-Parquet formats, BLOB columns, and 
deletion-vector paths could miss pushed predicate/projection/limit state.
   
   This was correctness-sensitive because Daft may treat pushed filters as 
already handled by the source. A query filtering a fallback table could plan 
the right split but still emit unfiltered rows from that split. There was also 
a related limit-ordering issue: applying `limit` inside the fallback reader 
while Daft still had remaining row or partition filters could truncate rows 
before those filters were evaluated.
   
   This patch makes fallback tasks use a configured `TableRead`, keeps the 
required filter columns available for fallback execution, and centralizes the 
source-side limit decision so `limit` is only pushed when it is safe for the 
source to apply it before returning rows.
   
   ### Tests
   
   ```shell
   python -m py_compile \
     paimon-python/pypaimon/daft/daft_datasource.py \
     paimon-python/pypaimon/tests/daft/daft_data_test.py
   
   python -m pytest paimon-python/pypaimon/tests/daft/daft_data_test.py -q
   
   python -m pytest \
     
paimon-python/pypaimon/tests/daft/daft_sink_test.py::TestBlobType::test_write_read_blob_type
 -q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [python] Fix Daft fallback filter limit pushdown [paimon]

Reply via email to