MMCMA commented on issue #46929: URL: https://github.com/apache/arrow/issues/46929#issuecomment-3048139846
Not sure if this following issue belongs here, but it seems related: We observed a significant increase in memory consumption when upgrading from `pyarrow` 17.0.0 to later versions, particularly during calls to `.to_pandas()`. For example, comparing memory usage between `pyarrow` 20.0.0 and 17.0.0: ``` tbl_a : 14.0 GB -> 19.3 GB tbl_b : 17.9 GB -> 39.3 GB tbl_c : 23.7 GB -> 46.4 GB ``` The function call is simply: ```python tbl.to_pandas(self_destruct=True, split_blocks=True) ``` Has anything changed in `.to_pandas()` implementation or memory management post-17.0.0 that could explain this behavior? Other than downgrading back to 17.0.0, is there a recommended way to mitigate this increased memory usage? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org