stevenzwu edited a comment on issue #2900: URL: https://github.com/apache/iceberg/issues/2900#issuecomment-894383649
@ayush-san let's assume if your taskmanager has 1 CPU and 4 GB of memory and you run 10 disjointed pipelines (each with parallelism of 1) in the same process, so you have 10 IcebergFileWriter tasks running in the same taskmanager. Each writer can use 128 MB for Parquet row group size. so the row group memory usage can add up to 1 GB. Actual number probably will be a few Xs of the 1 GB due to other memory overhead with Parquet writer. if you have the heap dump file, try it with Eclipse MAT. the denominator tree is quite useful to drill down the class holding on to the memory. Use the heap dump file generated by `-XX:+HeapDumpOnOutOfMemoryError`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
