Re: [I] When pyiceberg loads Iceberg tables containing large JSON data into memory, memory usage explodes in pyarrow [iceberg-python]

via GitHub Fri, 20 Mar 2026 11:59:59 -0700


kevinjqliu commented on issue #3168:
URL: 
https://github.com/apache/iceberg-python/issues/3168#issuecomment-4100364150


   > The issue is that by default pyarrow loads the strings per row into 
memory, which blows up the memory. If we download the datafile and open it 
directly via pyarrow this behaviour can be reproduced.
   
   👍 #2676 solves the memory issue for reading multiple batches. If reading a 
single batch blows up the memory, the only fix here is to use 
dictionary-encoding


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] When pyiceberg loads Iceberg tables containing large JSON data into memory, memory usage explodes in pyarrow [iceberg-python]

Reply via email to