stevenzwu commented on issue #2900: URL: https://github.com/apache/iceberg/issues/2900#issuecomment-894383649
@ayush-san let's assume if you taskmanager has 1 CPU and 4 GB of memory and you run 10 disjointed pipelines (with parallelism of 1) in the same process, so you have 10 IcebergFileWriter tasks running in the same taskmanager. Each writer can use 128 MB for Parquet row group size. I am sure there will be a few Xs of overhead of 128 MB row group size. so the memory usage can add up. if you have the heap dump file, try it with Eclipse MAT. the denominator tree is quite useful to drill down the class holding on to the memory. Use the heap dump file generated by `-XX:+HeapDumpOnOutOfMemoryError`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
