Re: [PR] [python] Add parallel split reading to to_pandas / to_arrow [paimon]

via GitHub Fri, 15 May 2026 22:49:49 -0700


TheR1sing3un commented on code in PR #7870:
URL: https://github.com/apache/paimon/pull/7870#discussion_r3252265966



##########
paimon-python/pypaimon/read/table_read.py:
##########
@@ -104,13 +141,31 @@ def _try_to_pad_batch_by_schema(batch: 
pyarrow.RecordBatch, target_schema):
 
         return pyarrow.RecordBatch.from_arrays(columns, schema=target_schema)
 
-    def to_arrow(self, splits: List[Split]) -> Optional[pyarrow.Table]:
-        batch_reader = self.to_arrow_batch_reader(splits)
+    def to_arrow(
+        self,
+        splits: List[Split],
+        max_workers: Optional[int] = None,

Review Comment:
   > Maybe it is better to provide an table option `read.parallelism`?
   
   A very good suggestion. I added this option. 
   By the way, since this option is a table option, and for read apis like 
`to_pandas`/`to_arrow`, is it still necessary to retain a parameter to override 
the parallelism of the table parameter? I have currently provided this 
parameter. If you think it's not necessary, I can remove it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python] Add parallel split reading to to_pandas / to_arrow [paimon]

Reply via email to