jorisvandenbossche commented on issue #40301: URL: https://github.com/apache/arrow/issues/40301#issuecomment-2045407817
> Is it a practical concern? I think the original reported case (converting a tiny table of a few kilobytes to pandas can give a spike of several hundred MBs in memory usage) is something people can certainly run into. And although it will often not be concerning (typically when working with smaller data, memory usage is not an issue, and when actually working with larger tables and memory usage becomes relevant, this overhead will disappear), it is definitely surprising and can lead to confusion. So I think it is worth "fixing". But the potential fix I was thinking of could also be something much simpler, like with some heuristic decide to just not do the conversion in parallel for smaller data. For the conversion the other way around (pandas -> pyarrow), we actually have some heuristic currently (in python): https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/pandas_compat.py#L573-L581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
