tverdokhlebd opened a new issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records URL: https://github.com/apache/incubator-hudi/issues/1491 Hello. I have config: -- docker run --rm -v /var/lib/jenkins-slave/workspace/Transfer_ml_data_to_s3_hudi:/var/lib/jenkins-slave/workspace/Transfer_ml_data_to_s3_hudi -v /mnt/ml_data:/mnt/ml_data bde2020/spark-master:2.4.5-hadoop2.7 bash ./spark/bin/spark-submit --master 'local[2]' --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.hadoop:hadoop-aws:2.7.3,org.apache.spark:spark-avro_2.11:2.4.4 --conf spark.local.dir=/mnt/ml_data --conf spark.ui.enabled=false --conf spark.driver.memory=4g --conf spark.driver.memoryOverhead=1024 --conf spark.driver.maxResultSize=2g --conf spark.kryoserializer.buffer.max=512m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.rdd.compress=true --conf spark.shuffle.service.enabled=true --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.executorEnv.hudi.outputPath=s3a://ir-mtu-ml-bucket/ml_hudi --conf spark.executorEnv.hudi.tableName=ext_ml_data --conf spark.executorEnv.hudi.recordKey=tds_cid --conf spark.executorEnv.hudi.precombineKey=hit_timestamp --conf spark.executorEnv.hudi.parallelism=8 --conf spark.executorEnv.hudi.bulkInsertParallelism=8 --class mtu.spark.analytics.ExtMLDataToS3 -- I do: 1. Bulk insert from Vertica to s3 storage with 30M records; 2. Upsert from Vertica to the same s3 storage with the same 30M records; I received various errors during three tries: - java.util.concurrent.TimeoutException: Cannot receive any reply from c3dcd5b0c2ab:41498 in 10000 milliseconds - java.lang.OutOfMemoryError: Java heap space - The java.lang.OutOfMemoryError: GC overhead limit exceeded error I have read "Tunning guide" and tried to tune some technics - memory fractions (Decreased from default values), off-heap parameters, collector, parallelism and etc. I also increased driver memory from 4GB to 10GB, but it also does not help me. Stack trace with heap space https://drive.google.com/open?id=1-Kerkt2j-z_0zXal01rha55j1NzsxYto
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
