[GitHub] [hudi] kingkongpoon edited a comment on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

GitBox Mon, 22 Feb 2021 01:16:31 -0800


kingkongpoon edited a comment on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-783209847



   > To help investigate better
   > 
   > * Can you post the configs you used to write to hudi.
   > * Can you post a screen shot of spark stages. So that we know where its 
failing and can relate to some configs used.
   > * Can you give some rough idea of your dataset record keys. Is it 
completely random or does it have some ordering to it. what it is made of.
   > * I assume you are using regular bloom as index type.
   
   my Spark code configure
   ```
   input
         .write.format("org.apache.hudi")
         .option("hoodie.cleaner.commits.retained", 1)
         .option("hoodie.keep.min.commits", 2)
         .option("hoodie.keep.max.commits", 3)
         .option("hoodie.insert.shuffle.parallelism", 30)
         .option("hoodie.upsert.shuffle.parallelism", 30)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uuid")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, 
"etl_modify_time")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY, 
classOf[DefaultHoodieRecordPayload].getName)
         .option(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP, 
"etl_modify_time")
         .option("hoodie.table.name", "std_order") 
         .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, hiveserver2)
         .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "dwd_std")
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "std_order")
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
classOf[ComplexKeyGenerator].getName)
         .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, 
"created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
classOf[MultiPartKeysValueExtractor].getName)
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
         .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
         .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
         .mode(SaveMode.Overwrite)
   //    .mode(SaveMode.Append)
         .save(basePath)
   
   ```
   ```
   spark-submit --master yarn --driver-memory 4G --executor-memory  8G 
--executor-cores 4 --num-executors 10 
   --conf spark.executor.memoryOverhead=4G  --conf spark.yarn.max.ex.ilures=100 
 
   --class com.qmtec.peony.newcrm.hudi.process  --jars 
hudi-hadoop-mr-bundle-0.7.0.jar 
   --jars hudi-hive-sync-bundle-0.7.0.jar --jars 
hudi-spark-bundle_2.11-0.7.0.jar qmtec-peony-etl-hudi-1.0.jar 
   ```
   
   uuid is the tid from order table data,and it is unique,When I first wirte 
data in HDFS, .mode(SaveMode.Overwrite), and create hive table successfully 
,the file in HDFS is about 520MB.
   But when I use the same code,configure and data in .mode(SaveMode.Append), 
the process will throw errors
   ```
   21/02/22 15:57:43 ERROR [dispatcher-event-loop-5] YarnScheduler: Lost 
executor 4 on emr-worker-2.cluster-47763: Container from a bad node: 
container_e10_1610102487810_52481_01_000005 on host: 
emr-worker-2.cluster-47763. Exit status: 137. Diagnostics: Container killed on 
request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:57:45 ERROR [dispatcher-event-loop-7] YarnScheduler: Lost 
executor 5 on emr-worker-4.cluster-47763: Container from a bad node: 
container_e10_1610102487810_52481_01_000006 on host: 
emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on 
request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:12 ERROR [dispatcher-event-loop-2] YarnScheduler: Lost 
executor 7 on emr-worker-4.cluster-47763: Container from a bad node: 
container_e10_1610102487810_52481_01_000009 on host: 
emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on 
request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:31 ERROR [dispatcher-event-loop-4] YarnScheduler: Lost 
executor 8 on emr-worker-4.cluster-47763: Container from a bad node: 
container_e10_1610102487810_52481_01_000010 on host: 
emr-worker-4.cluster-47763. Exit status: 1. Diagnostics: Exception from 
container-launch.
   Container id: container_e10_1610102487810_52481_01_000010
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
        at org.apache.hadoop.util.Shell.run(Shell.java:869)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   
   ```
   but sometime it can run successfully ,it wil have two parquet files,and each 
parquet file are also ablout 520MB
   and the table root path has a .hoodie file ,when I run each time ,this file 
will become bigger 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] kingkongpoon edited a comment on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Reply via email to