IamJeffG commented on issue #37630: URL: https://github.com/apache/arrow/issues/37630#issuecomment-1734421879
> maybe when users have too many columns Maybe but I think there must be more to it. [In my example](https://github.com/apache/arrow/issues/37820) I have a partitioned parquet dataset on local disk, 8.6GB in total, with 13 columns and 38,747 fragments. Writing this dataset to a new location on disk (i.e. to compact the fragments) consumes all 8GB of RAM on my machine and then swaps to disk. I can't imagine that 13 columns or even 13×38,747 takes upwards of 8GB of memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
