kevinjqliu commented on issue #3168: URL: https://github.com/apache/iceberg-python/issues/3168#issuecomment-4100364150
> The issue is that by default pyarrow loads the strings per row into memory, which blows up the memory. If we download the datafile and open it directly via pyarrow this behaviour can be reproduced. 👍 #2676 solves the memory issue for reading multiple batches. If reading a single batch blows up the memory, the only fix here is to use dictionary-encoding -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
