ravwojdyla opened a new issue, #35393:
URL: https://github.com/apache/arrow/issues/35393
### Describe the bug, including details regarding any error messages,
version, and platform.
We have a code to fetch parquet schema from a file using pyarrow, here's a
minimal example:
```py
import pyarrow.parquet as pq
with open("/tmp/part.snappy.parquet", mode="rb") as fd:
s = pq.read_schema(fd)
```
That example file is about 288MB, we've notice that the resident memory
usage of this code spikes close to 500MB:
<img width="1124" alt="image"
src="https://user-images.githubusercontent.com/1419010/235752389-504c0e3c-93ef-4a54-8bfc-62aed6d85417.png">
Is this expected that to fetch schema, we need to allocate so much memory?
Worth noting that this memory is eventually freed up. Should some arguments be
tweaked or is this a bug somewhere?
```sh
> du -sh /tmp/part.snappy.parquet
288M /tmp/part.snappy.parquet
```
Versions (py 3.10):
```
> conda list | grep arrow
arrow-cpp 12.0.0 hce30654_0_cpu conda-forge
libarrow 12.0.0 h3b4cbd9_0_cpu conda-forge
pyarrow 12.0.0 py310h7c67832_0_cpu conda-forge
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]