TheR1sing3un commented on code in PR #7870:
URL: https://github.com/apache/paimon/pull/7870#discussion_r3252265966
##########
paimon-python/pypaimon/read/table_read.py:
##########
@@ -104,13 +141,31 @@ def _try_to_pad_batch_by_schema(batch:
pyarrow.RecordBatch, target_schema):
return pyarrow.RecordBatch.from_arrays(columns, schema=target_schema)
- def to_arrow(self, splits: List[Split]) -> Optional[pyarrow.Table]:
- batch_reader = self.to_arrow_batch_reader(splits)
+ def to_arrow(
+ self,
+ splits: List[Split],
+ max_workers: Optional[int] = None,
Review Comment:
> Maybe it is better to provide an table option `read.parallelism`?
A very good suggestion. I added this option.
By the way, since this option is a table option, and for read apis like
`to_pandas`/`to_arrow`, is it still necessary to retain a parameter to override
the parallelism of the table parameter? I have currently provided this
parameter. If you think it's not necessary, I can remove it again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]