MikeBuh commented on issue #5481: URL: https://github.com/apache/hudi/issues/5481#issuecomment-1116990374
Hi @yihua the batches I am trying to load are around 9GB each. For the latest test I have tried to load only 2 of these batches but not even one of them managed to be processed successfully. I tried 2 runs and both failed with the executor running out of memory. For both runs I was using the same Spark resources, with the only difference being the parallelism (both for Spark and Hudi): **Common Spark Parameters** > spark.driver.cores: 5 > spark.driver.memory: 24100m > spark.driver.memoryOverhead: 2680m > > spark.executor.instances: 10 > spark.executor.cores: 5 > spark.executor.memory: 24100m > spark.executor.memoryOverhead: 2680m > spark.memory.storageFraction: 0.6 > spark.memory.fraction: 0.7 > > spark.kryoserializer.buffer.max: 1024m > > spark.driver.extraJavaOptions: -Xloggc:/var/log/spark-GClog.log -XX:+PrintGC -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' > > spark.executor.extraJavaOptions: -Xloggc:/var/log/spark-GClog.log -XX:+PrintGC -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' **Run 1 Parallelism** > spark.default.parallelism: 100 > spark.sql.shuffle.partitions: 100 > hoodie.upsert.shuffle.parallelism: 100 **Run 2 Parallelism** > spark.default.parallelism: 250 > spark.sql.shuffle.partitions: 250 > hoodie.upsert.shuffle.parallelism: 250 Following the above and unfortunate persisting failures, might any of the following effect the performance and/or have anything to do with the required resources? - size of target table: I noticed that reloading the same batches to a near empty table is more successful - file sizes: maybe having less but larger files in the target table can help when comparing and updating - compaction and cleanup: if these are heavy operations that need lots of memory then perhaps they can be tweaked Thanks once again for your reply -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
