Re: [PR] [python][daft] Make Daft Paimon read source serializable [paimon]

via GitHub Fri, 29 May 2026 02:53:52 -0700


kerwin-zk commented on code in PR #8029:
URL: https://github.com/apache/paimon/pull/8029#discussion_r3323546641



##########
paimon-python/pypaimon/daft/daft_datasource.py:
##########
@@ -492,6 +665,14 @@ def _source_limit(
     def _requires_fallback_reader(self) -> bool:
         return not self._is_parquet or self._has_blob_columns or 
self._table.is_primary_key_table
 
+    def _requires_serializable_paimon_reader_task(self) -> bool:
+        if self._warehouse_scheme in ("", "file"):

Review Comment:
   You're right, thanks. I removed the gate: a normal append-only Parquet table 
on OSS now goes through Daft's native reader under the Ray runner. This is safe 
because both the source and the pypaimon fallback task serialize only 
rebuildable metadata (catalog options, identifier, table path), and the native 
task carries Daft's own picklable `StorageConfig`. The pypaimon reader task is 
now used only for splits that genuinely need it (PK/LSM merge, non-Parquet, 
blob columns, deletion vectors). I verified end-to-end that community Daft 
reading an append-only Parquet table on OSS under the Ray runner uses the 
native reader and returns correct results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python][daft] Make Daft Paimon read source serializable [paimon]

Reply via email to