zbs commented on issue #45638: URL: https://github.com/apache/arrow/issues/45638#issuecomment-2689678138
> From the footprint above, it seems that metadata of written row groups consumes 16.25% of the total memory. Is something else dominating the memory consumption? AFAICT nothing is dominating it -- I will try to increase the input subset used and run again. It's possible with my subsetting I've reduced the severity of the bug. > Did you write a large number of row groups and columns? Yes, dozens of files each with around 10K batches, totaling 10s of millions of rows. In many cases the batches contain 1 row. I disabled dictionary and statistics and will report back. Re: better APIs, our entire environment needs to upgrade at the same time, so when that happens in a couple months, and this is still not resolved, hopefully these APIs will solve it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
