JB-data edited a comment on issue #4055:
URL: https://github.com/apache/hudi/issues/4055#issuecomment-975241019
@xushiyan
Thanks for reply. Indeed the sql should say FROM < SRC > ... I will correct.
Where exactly should we check the application stacktrace? In both the the
yarn application logs as the SPARK UI stacktrace I only find back the fact the
container exited with code 134 or 143 (changes ), but not exactly why. The only
place where I can infer where this happens is the stage in the spark ui that
mentions:
isEmpty at DeltaSync.java:344
so I am assuming it comes from there? Here another part of the yarn
applicatino logs:
```
[2021-11-19 13:44:57.189]Container exited with a non-zero exit code 134.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 7907 Aborted
LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hadoop/../../../CDH-7.2.2-1.cdh7.2.2.p3.7839477/lib/hadoop/lib/native:"
/usr/lib/jvm/java/bin/java -server -Xmx12288m
'-Djava.security.auth.login.config=./client.jaas'
-Djava.io.tmpdir=/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/tmp
'-Dspark.driver.port=33466' '-Dspark.network.crypto.enabled=false'
'-Dspark.authenticate=false' '-Dspark.shuffle.service.port=7337'
'-Dspark.ui.port=0'
-Dspark.yarn.app.container.log.dir=/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009
-XX:OnOutOfMemoryError='kill %p'
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
spark://CoarseGrainedScheduler@someworker0.:33466 --executor-id 8 --hostname
someworkerr1 --cores 1 --app-id application_1632244056069_1301
--user-class-path file:/
hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/__app__.jar
--user-class-path
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hive-service-3.1.3000.7.2.2.3-1.jar
--user-class-path
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hive-jdbc-3.1.3000.7.2.2.3-1.jar
--user-class-path
file:/hadoopfs/fs1/nodemanager/usercache/myuser/appcache/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/hbase-client.jar
>
/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/stdout
2>
/hadoopfs/fs1/nodemanager/log/application_1632244056069_1301/container_e29_1632244056069_1301_01_000009/stderr
Last 4096 bytes of stderr :
Last 4096 bytes of stderr :
ed but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.partition_fields' was supplied but isn't a known
config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.schemaprovider.source.schema.file' was supplied but isn't
a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.transformer.sql' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.enable' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.assume_date_partitioning' was supplied but isn't a
known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.useJdbc' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.schemaprovider.target.schema.file' was supplied but isn't
a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known
config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.jdbcurl' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.database' was supplied but isn't a known config.
21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka version: 2.4.1.7.1.1.0-565
21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka commitId: 7a66ef499adcd5b9
21/11/19 13:44:52 INFO utils.AppInfoParser: Kafka startTimeMs: 1637329492101
21/11/19 13:44:52 INFO consumer.KafkaConsumer: [Consumer
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi]
Subscribed to partition(s): fr24messages-0
21/11/19 13:44:52 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:52 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
21/11/19 13:44:52 INFO codegen.CodeGenerator: Code generated in 317.609485 ms
21/11/19 13:44:52 INFO codegen.CodeGenerator: Code generated in 19.735724 ms
21/11/19 13:44:52 INFO kafka010.InternalKafkaConsumer: Initial fetch for
spark-executor-hudi fr24messages-0 10254239
21/11/19 13:44:52 INFO consumer.KafkaConsumer: [Consumer
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi] Seeking
to offset 10254239 for partition fr24messages-0
21/11/19 13:44:53 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:53 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
21/11/19 13:44:53 INFO clients.Metadata: [Consumer
clientId=consumer-spark-executor-hudi-1, groupId=spark-executor-hudi] Cluster
ID: AFejYh8dRyaRfFU4x5PeMA
21/11/19 13:44:54 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:54 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
21/11/19 13:44:54 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:54 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
21/11/19 13:44:55 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:55 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
21/11/19 13:44:56 INFO codegen.CodeGenerator: Code generated in 208.118952 ms
21/11/19 13:44:56 INFO codegen.CodeGenerator: Code generated in 91.06291 ms
21/11/19 13:44:56 INFO executor.CoarseGrainedExecutorBackend: Received
tokens of 4197 bytes
21/11/19 13:44:56 INFO deploy.SparkHadoopUtil: Updating delegation tokens
for current user.
.
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1931)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1919)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1918)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1918)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
at scala.Option.foreach(Option.scala:257)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:953)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2152)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2101)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2090)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:764)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2103)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2122)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1409)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.take(RDD.scala:1382)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1517)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1517)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1516)
at
org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
at
org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:344)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:233)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:161)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:159)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:464)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:665)
....
...
Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 1.0 failed 8 times, most recent failure: Lost task 0.7 in stage 1.0 (TID
8, myworker1, executor 8): ExecutorLostFailure (executor 8 exited caused by one
of the running tasks) Reason: Container from a bad node:
container_e29_1632244056069_1301_01_000009 on host: myworker1. Exit status:
134. Diagnostics: 2 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.transformer.sql' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.enable' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.assume_date_partitioning' was supplied but isn't a
known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.useJdbc' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.deltastreamer.schemaprovider.target.schema.file' was supplied but isn't
a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known
config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.jdbcurl' was supplied but isn't a known config.
21/11/19 13:44:52 WARN consumer.ConsumerConfig: The configuration
'hoodie.datasource.hive_sync.database' was supplied but isn't a known config.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]