westonpace commented on issue #13827:
URL: https://github.com/apache/arrow/issues/13827#issuecomment-1209847707
@drin is correct, these functions are not exposed to pyarrow at the moment.
However, from pyarrow, if you use the datasets API to read those files, it
should achieve the desired effect:
```
import pyarrow.dataset as ds
my_dataset = ds.dataset(['/tmp/my_ipc.arrow'], format='arrow')
my_table = my_dataset.to_table()
```
Another benefit is if you can save your file in multiple row groups. This
will allow you to start doing processing before you load the entire file into
memory. I'm not sure if this is workable for you or not:
```
import pyarrow.dataset as ds
my_dataset = ds.dataset(['/tmp/my_ipc.arrow'], format='arrow')
for next_batch in my_dataset.to_batches():
# Do something with batch
print(batch)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]