[GitHub] [hudi] JB-data edited a comment on issue #4055: [SUPPORT] Hudi with SqlQueryBasedTransformer fails-> spark error exit 134 or exit 143 in "isEmpty at DeltaSync.java:344" : Container from a bad node: container_yy on host: xxx Exit status: 134

GitBox Mon, 22 Nov 2021 02:13:48 -0800


JB-data edited a comment on issue #4055:
URL: https://github.com/apache/hudi/issues/4055#issuecomment-975241019



   @xushiyan 
   Thanks for reply. Indeed the query in my ticket was wrong- the sql really 
says FROM < SRC > ... I just corrected (so the job always has run with correct 
statement).
   
   Where exactly should we check the application stacktrace? In both the the 
yarn application logs as the SPARK UI stacktrace I only find back the fact the 
container exited with code 134 or 143 (changes ), but not exactly why. The only 
place where I can infer where this happens is the stage in the spark ui that 
mentions: 
   isEmpty at DeltaSync.java:344
   so I am assuming it comes from there? Here another part of the yarn 
applicatino logs:
   
   
   ```
   [2021-11-19 13:44:57.189]Container exited with a non-zero exit code 134. 
Error file: prelaunch.err.
   Last 4096 bytes of prelaunch.err :
   /bin/bash: line 1:  7907 Aborted                 
LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hadoop/../../../CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hadoop/lib/native:"
 /usr/lib/jvm/java/bin/java -server -Xmx12288m 
'-Djava.security.auth.login.config=./client.jaas' 
-Djava.io.tmpdir=/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/tmp
 '-Dspark.driver.port=33466' '-Dspark.network.crypto.enabled=false' 
'-Dspark.authenticate=false' '-Dspark.shuffle.service.port=7337' 
'-Dspark.ui.port=0' 
-Dspark.yarn.app.container.log.dir=/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009
 -XX:OnOutOfMemoryError='kill %p' 
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
spark://CoarseGrainedScheduler@someworker0.:33466 --executor-id 8 --hostname 
someworkerr1 --cores 1 --app-id application_1632244056069_1301 
--user-class-path file:/
 
hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/__app__.jar
 --user-class-path 
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hive-service-3.1.3000.7.2.2.3-1.jar
 --user-class-path 
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hive-jdbc-3.1.3000.7.2.2.3-1.jar
 --user-class-path 
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hbase-client.jar
 > 
/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/stdout
 2> 
/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/stderr
   Last 4096 bytes of stderr :
   Last 4096 bytes of stderr :
   ed but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.partition_fields' was supplied but isn't a known 
config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.schemaprovider.source.schema.file' was supplied but isn't 
a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.transformer.sql' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.enable' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.assume_date_partitioning' was supplied but isn't a 
known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.useJdbc' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.schemaprovider.target.schema.file' was supplied but isn't 
a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known 
config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.jdbcurl' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.database' was supplied but isn't a known config.
   21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka version: 2.4.1.7.1.1.0-565
   21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka commitId: 7a66ef499adcd5b9
   21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka startTimeMs: 1637329492101
   21/11/19 13:44:52 INFO consumer.KafkaConsumer: [Consumer 
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi] 
Subscribed to partition(s): fr24messages-0
   21/11/19 13:44:52 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:52 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   21/11/19 13:44:52 INFO codegen.CodeGenerator: Code generated in 317.609485 ms
   21/11/19 13:44:52 INFO codegen.CodeGenerator: Code generated in 19.735724 ms
   21/11/19 13:44:52 INFO kafka010.InternalKafkaConsumer: Initial fetch for 
spark-executor-hudi fr24messages-0 10254239
   21/11/19 13:44:52 INFO consumer.KafkaConsumer: [Consumer 
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi] Seeking 
to offset 10254239 for partition fr24messages-0
   21/11/19 13:44:53 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:53 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   21/11/19 13:44:53 INFO clients.Metadata: [Consumer 
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi] Cluster 
ID: AFejYh8dRyaRfFU4x5PeMA
   21/11/19 13:44:54 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:54 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   21/11/19 13:44:54 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:54 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   21/11/19 13:44:55 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:55 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   21/11/19 13:44:56 INFO codegen.CodeGenerator: Code generated in 208.118952 ms
   21/11/19 13:44:56 INFO codegen.CodeGenerator: Code generated in 91.06291 ms
   21/11/19 13:44:56 INFO executor.CoarseGrainedExecutorBackend: Received 
tokens of 4197 bytes
   21/11/19 13:44:56 INFO deploy.SparkHadoopUtil: Updating delegation tokens 
for current user.
   
   
   .
   Driver stacktrace:
     at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1931)
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1919)
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1918)
     at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
     at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1918)
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
     at scala.Option.foreach(Option.scala:257)
     at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:953)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2152)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2101)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2090)
     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:764)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2103)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2122)
     at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1409)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
     at org.apache.spark.rdd.RDD.take(RDD.scala:1382)
     at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1517)
     at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
     at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
     at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1516)
     at 
org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
     at 
org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
     at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:344)
     at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:233)
     at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:161)
     at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
     at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:159)
     at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:464)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:665)
   ....
   ...
   Driver stacktrace:
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 8 times, most recent failure: Lost task 0.7 in stage 1.0 (TID 
8, myworker1, executor 8): ExecutorLostFailure (executor 8 exited caused by one 
of the running tasks) Reason: Container from a bad node: 
container_e29_1632244056069_1301_01_000009 on host: myworker1. Exit status: 
134. Diagnostics: 2 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.transformer.sql' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.enable' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.assume_date_partitioning' was supplied but isn't a 
known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.useJdbc' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.deltastreamer.schemaprovider.target.schema.file' was supplied but isn't 
a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known 
config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.jdbcurl' was supplied but isn't a known config.
   21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration 
'hoodie.datasource.hive_sync.database' was supplied but isn't a known config.
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] JB-data edited a comment on issue #4055: [SUPPORT] Hudi with SqlQueryBasedTransformer fails-> spark error exit 134 or exit 143 in "isEmpty at DeltaSync.java:344" : Container from a bad node: container_yy on host: xxx Exit status: 134

Reply via email to