lnicola opened a new issue #7338:
URL: https://github.com/apache/arrow/issues/7338
I'm running this query over a 14 GB Arrow IPC file:
```python
>>> ds = dataset.dataset("foo.ipc", format="ipc")
>>> t = ds.to_table(filter=dataset.field('ID') <= 1000).to_pandas()
>>> t
[snip]
[914 rows x 617 columns]
```
If I'm reading the documentation correctly, it should scan the file
collecting the results, but not load it in memory. However, the RSS grows up to
about 14 GB while running it, then goes back down.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]