ianmcook commented on issue #44561:
URL: https://github.com/apache/arrow/issues/44561#issuecomment-2444773143
It's easy enough to create a record batch reader from a collection of
multiple Arrow IPC stream files with the same schema like this:
```py
import pyarrow as pa
import glob
def get_schema(paths):
with open(path, "rb") as file:
reader = pa.ipc.open_stream(file)
return reader.schema
def get_batches(paths):
for path in paths:
with open(path, "rb") as file:
reader = pa.ipc.open_stream(file)
for batch in reader:
yield batch
paths = glob.glob("*.arrows")
reader = pa.ipc.RecordBatchStreamReader.from_batches(
get_schema(paths),
yield_batches(paths)
)
```
Still, it would be nice to have a method in PyArrow that makes this more
efficient and concise to express.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]