kr-hansen commented on issue #37989: URL: https://github.com/apache/arrow/issues/37989#issuecomment-2604517486
Hmmm what version were you using @DatSplit? When working with very large data (Dataframes ~200 GB), I continue to see memory crashes with `pyarrow` that I don't get with `fastparquet`. While when writing out `fastparquet` keeps my memory flat during the writing out, `pyarrow` has a huge spike that 3-5.5x the memory footprint of the actual data itself. This is for `pyarrow 19.0.0` for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
