[GitHub] [arrow] IamJeffG commented on issue #37630: [C++] Potential memory leak in Parquet reading with Dataset

via GitHub Mon, 25 Sep 2023 13:34:11 -0700


IamJeffG commented on issue #37630:
URL: https://github.com/apache/arrow/issues/37630#issuecomment-1734421879


   > maybe when users have too many columns
   
   Maybe but I think there must be more to it.  [In my 
example](https://github.com/apache/arrow/issues/37820) I have a partitioned 
parquet dataset on local disk, 8.6GB in total, with 13 columns and 38,747 
fragments.  Writing this dataset to a new location on disk (i.e. to compact the 
fragments) consumes all 8GB of RAM on my machine and then swaps to disk.  I 
can't imagine that 13 columns or even 13×38,747 takes upwards of 8GB of memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] IamJeffG commented on issue #37630: [C++] Potential memory leak in Parquet reading with Dataset

Reply via email to