stevenzwu commented on issue #2900: URL: https://github.com/apache/iceberg/issues/2900#issuecomment-893575995
I am not surprised by `org.apache.flink.runtime.taskmanager.Task` as it is the worker thread. I often found denominator tree from Eclipse MAT very useful to understand the memory footprint: https://www.eclipse.org/mat/about/dominator_tree.png Reading the original description, I have some questions for this setup: "We have combined all the tables of a single DB in one job." 1. are tables partitioned by event time or ingestion time? 2. do you run multiple writer tasks/threads in the same core? E.g., assuming your DB has 10 tables, does each slot/core run 10 writers or only 1 writer? If each taskmanager has too many writer tasks, memory consumption by Parquet is going to be a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
