[GitHub] [hudi] lenhardtx opened a new issue, #8159: [SUPPORT]

via GitHub Sun, 12 Mar 2023 07:56:45 -0700


lenhardtx opened a new issue, #8159:
URL: https://github.com/apache/hudi/issues/8159


   **Environment Description**
   
   * Hudi version :
   *0.13
   
   * Spark version :
   3.3.2
   
   * Hive version :
   apache-hive-4
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   Minio
   
   * Running on Docker? (yes/no) :
   no
   
   **Additional context**
   Job:
   [root@spark-hudi ~]# cat testeHudi_v3.sh
   #!/bin/bash
   spark-submit --name "hudi_sbr_ped_venda" --master spark://0.0.0.0:7077 
--deploy-mode client --driver-memory 1G \
     --packages org.apache.hadoop:hadoop-aws:3.3.4 \
     --jars 
"/root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar,/root/hudi/spark-avro_2.13-3.3.2.jar"
 \
     --conf spark.executor.memory=2g --conf spark.cores.max=100 \
     --conf 'spark.hadoop.fs.s3a.access.key=admin' \
     --conf 'spark.hadoop.fs.s3a.secret.key=XXXXXXX'\
     --conf 'spark.hadoop.fs.s3a.endpoint=http://0.0.0.0:9000' \
     --conf 'spark.hadoop.fs.s3a.path.style.access=true' \
     --conf 'fs.s3a.signing-algorithm=S3SignerType' \
     --conf 'spark.sql.catalogImplementation=hive' \
     --conf 'spark.debug.maxToStringFields=500' \
     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar \
     --table-type COPY_ON_WRITE --op UPSERT \
     --target-base-path s3a://hudi/sbr_ped_venda \
     --target-table sbr_ped_venda --continuous \
     --min-sync-interval-seconds 60 \
     --source-class 
org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
     --source-ordering-field _event_lsn \
     --payload-class 
org.apache.hudi.common.model.debezium.PostgresDebeziumAvroPayload \
     --hoodie-conf schema.registry.url=http://0.0.0.0:8081 \
     --hoodie-conf 
hoodie.deltastreamer.schemaprovider.registry.url=http://0.0.0.0:8081/subjects/pgprd.public.sbr_ped_venda-value/versions/latest
 \
     --hoodie-conf bootstrap.servers=0.0.0.0:9092 \
     --hoodie-conf 
hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer
 \
     --hoodie-conf 
hoodie.deltastreamer.source.kafka.topic=pgprd.public.sbr_ped_venda \
     --hoodie-conf hoodie.deltastreamer.source.kafka.group.id=datalake \
     --hoodie-conf auto.offset.reset=earliest \
     --hoodie-conf group.id=datalake \
     --hoodie-conf hoodie.datasource.write.recordkey.field=id \
     --hoodie-conf validate.non.null=false \
     --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
 \
     --hoodie-conf hoodie.datasource.write.hive_style_partitioning=false
   
   I've tried several ways, this is the last test I did.
   
   
![image](https://user-images.githubusercontent.com/59495895/224552503-e7d3cd0f-0665-434c-bad7-8e223f9037be.png)
   
   
   **Stacktrace**
   [root@spark-hudi ~]# ./testeHudi_v3.sh
   Warning: Ignoring non-Spark config property: fs.s3a.signing-algorithm
   :: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
   Ivy Default Cache set to: /root/.ivy2/cache
   The jars for the packages stored in: /root/.ivy2/jars
   org.apache.hadoop#hadoop-aws added as a dependency
   :: resolving dependencies :: 
org.apache.spark#spark-submit-parent-5dcabf23-e567-4306-b4b1-acb277c8e07d;1.0
           confs: [default]
           found org.apache.hadoop#hadoop-aws;3.3.4 in central
           found com.amazonaws#aws-java-sdk-bundle;1.12.262 in central
           found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
   :: resolution report :: resolve 466ms :: artifacts dl 25ms
           :: modules in use:
           com.amazonaws#aws-java-sdk-bundle;1.12.262 from central in [default]
           org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
           org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in 
[default]
           ---------------------------------------------------------------------
           |                  |            modules            ||   artifacts   |
           |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
           ---------------------------------------------------------------------
           |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
           ---------------------------------------------------------------------
   :: retrieving :: 
org.apache.spark#spark-submit-parent-5dcabf23-e567-4306-b4b1-acb277c8e07d
           confs: [default]
           0 artifacts copied, 3 already retrieved (0kB/14ms)
   23/03/12 11:40:38 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   23/03/12 11:40:39 WARN SchedulerConfGenerator: Job Scheduling Configs will 
not be in effect as spark.scheduler.mode is not set to FAIR at instantiation 
time. Continuing without scheduling configs
   23/03/12 11:40:39 INFO SparkContext: Running Spark version 3.3.2
   23/03/12 11:40:39 INFO ResourceUtils: 
==============================================================
   23/03/12 11:40:39 INFO ResourceUtils: No custom resources configured for 
spark.driver.
   23/03/12 11:40:39 INFO ResourceUtils: 
==============================================================
   23/03/12 11:40:39 INFO SparkContext: Submitted application: 
delta-streamer-sbr_ped_venda
   23/03/12 11:40:39 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 2048, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
   23/03/12 11:40:39 INFO ResourceProfile: Limiting resource is cpu
   23/03/12 11:40:39 INFO ResourceProfileManager: Added ResourceProfile id: 0
   23/03/12 11:40:39 INFO SecurityManager: Changing view acls to: root
   23/03/12 11:40:39 INFO SecurityManager: Changing modify acls to: root
   23/03/12 11:40:39 INFO SecurityManager: Changing view acls groups to:
   23/03/12 11:40:39 INFO SecurityManager: Changing modify acls groups to:
   23/03/12 11:40:39 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set(); users  with modify permissions: Set(root); groups 
with modify permissions: Set()
   23/03/12 11:40:39 INFO deprecation: mapred.output.compression.codec is 
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
   23/03/12 11:40:39 INFO deprecation: mapred.output.compression.type is 
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
   23/03/12 11:40:39 INFO deprecation: mapred.output.compress is deprecated. 
Instead, use mapreduce.output.fileoutputformat.compress
   23/03/12 11:40:40 INFO Utils: Successfully started service 'sparkDriver' on 
port 45603.
   23/03/12 11:40:40 INFO SparkEnv: Registering MapOutputTracker
   23/03/12 11:40:40 INFO SparkEnv: Registering BlockManagerMaster
   23/03/12 11:40:40 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
   23/03/12 11:40:40 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
   23/03/12 11:40:40 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
   23/03/12 11:40:40 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-8cfb4a38-de92-4ea7-881f-e49d4d4750ea
   23/03/12 11:40:40 INFO MemoryStore: MemoryStore started with capacity 434.4 
MiB
   23/03/12 11:40:40 INFO SparkEnv: Registering OutputCommitCoordinator
   23/03/12 11:40:40 INFO Utils: Successfully started service 'SparkUI' on port 
8090.
   23/03/12 11:40:40 INFO SparkContext: Added JAR 
file:///root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar at 
spark://spark-hudi.smartbr.com:45603/jars/hudi-utilities-bundle_2.12-0.13.0.jar 
with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR 
file:///root/hudi/spark-avro_2.13-3.3.2.jar at 
spark://spark-hudi.smartbr.com:45603/jars/spark-avro_2.13-3.3.2.jar with 
timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR 
file:///root/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.3.4.jar at 
spark://spark-hudi.smartbr.com:45603/jars/org.apache.hadoop_hadoop-aws-3.3.4.jar
 with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR 
file:///root/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.12.262.jar at 
spark://spark-hudi.smartbr.com:45603/jars/com.amazonaws_aws-java-sdk-bundle-1.12.262.jar
 with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR 
file:///root/.ivy2/jars/org.wildfly.openssl_wildfly-openssl-1.0.7.Final.jar at 
spark://spark-hudi.smartbr.com:45603/jars/org.wildfly.openssl_wildfly-openssl-1.0.7.Final.jar
 with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: The JAR 
file:/root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar at 
spark://spark-hudi.smartbr.com:45603/jars/hudi-utilities-bundle_2.12-0.13.0.jar 
has been added already. Overwriting of added jar is not supported in the 
current version.
   23/03/12 11:40:40 INFO StandaloneAppClient$ClientEndpoint: Connecting to 
master spark://0.0.0.0:7077...
   23/03/12 11:40:41 INFO TransportClientFactory: Successfully created 
connection to /0.0.0.0.0:7077 after 50 ms (0 ms spent in bootstraps)
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: Connected to Spark 
cluster with app ID app-20230312114041-0014
   23/03/12 11:40:41 INFO StandaloneAppClient$ClientEndpoint: Executor added: 
app-20230312114041-0014/0 on worker-20230312102039-0.0.0.0-40923 
(192.168.200.15:40923) with 2 core(s)
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: Granted executor ID 
app-20230312114041-0014/0 on hostPort 0.0.0.0:40923 with 2 core(s), 2.0 GiB RAM
   23/03/12 11:40:41 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44591.
   23/03/12 11:40:41 INFO NettyBlockTransferService: Server created on 
spark-hudi.smartbr.com:44591
   23/03/12 11:40:41 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
   23/03/12 11:40:41 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, spark-hudi.smartbr.com, 44591, None)
   23/03/12 11:40:41 INFO BlockManagerMasterEndpoint: Registering block manager 
spark-hudi:44591 with 434.4 MiB RAM, BlockManagerId(driver, spark-hudi, 44591, 
None)
   23/03/12 11:40:41 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, spark-hudi, 44591, None)
   23/03/12 11:40:41 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, spark-hudi, 44591, None)
   23/03/12 11:40:41 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
app-20230312114041-0014/0 is now RUNNING
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
   23/03/12 11:40:42 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   23/03/12 11:40:42 INFO MetricsSystemImpl: Scheduled Metric snapshot period 
at 10 second(s).
   23/03/12 11:40:42 INFO MetricsSystemImpl: s3a-file-system metrics system 
started
   23/03/12 11:40:44 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   23/03/12 11:40:44 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   23/03/12 11:40:44 INFO UtilHelpers: Adding overridden properties to file 
properties.
   23/03/12 11:40:44 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
   23/03/12 11:40:45 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:45 INFO HoodieTableConfig: Loading table properties from 
s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:45 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:45 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir.
   23/03/12 11:40:45 INFO SharedState: Warehouse path is 
'file:/root/spark-warehouse'.
   23/03/12 11:40:48 INFO HoodieDeltaStreamer: Creating delta streamer with 
configs:
   auto.offset.reset: earliest
   bootstrap.servers: 0.0.0.0:9092
   group.id: datalake
   hoodie.auto.adjust.lock.configs: true
   hoodie.datasource.write.hive_style_partitioning: false
   hoodie.datasource.write.keygenerator.class: 
org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.datasource.write.reconcile.schema: false
   hoodie.datasource.write.recordkey.field: id
   hoodie.deltastreamer.schemaprovider.registry.url: 
http://0.0.0.0:8081/subjects/pgprd.public.sbr_ped_venda-value/versions/latest
   hoodie.deltastreamer.source.kafka.group.id: datalake
   hoodie.deltastreamer.source.kafka.topic: pgprd.public.sbr_ped_venda
   hoodie.deltastreamer.source.kafka.value.deserializer.class: 
io.confluent.kafka.serializers.KafkaAvroDeserializer
   schema.registry.url: http://0.0.0.0.:8081
   validate.non.null: false
   
   23/03/12 11:40:48 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:48 INFO HoodieTableConfig: Loading table properties from 
s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:48 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:48 INFO HoodieActiveTimeline: Loaded instants upto : 
Optional.empty
   23/03/12 11:40:49 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:49 INFO HoodieTableConfig: Loading table properties from 
s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:49 INFO HoodieTableMetaClient: Finished Loading Table of type 
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:49 INFO HoodieActiveTimeline: Loaded instants upto : 
Optional.empty
   23/03/12 11:40:49 INFO DeltaSync: Checkpoint to resume from : Optional.empty
   23/03/12 11:40:49 INFO ConsumerConfig: ConsumerConfig values:
           allow.auto.create.topics = true
           auto.commit.interval.ms = 5000
           auto.offset.reset = earliest
           bootstrap.servers = [0.0.0.0:9092]
           check.crcs = true
           client.dns.lookup = default
           client.id =
           client.rack =
           connections.max.idle.ms = 540000
           default.api.timeout.ms = 60000
           enable.auto.commit = true
           exclude.internal.topics = true
           fetch.max.bytes = 52428800
           fetch.max.wait.ms = 500
           fetch.min.bytes = 1
           group.id = datalake
           group.instance.id = null
           heartbeat.interval.ms = 3000
           interceptor.classes = []
           internal.leave.group.on.close = true
           isolation.level = read_uncommitted
           key.deserializer = class 
org.apache.kafka.common.serialization.StringDeserializer
           max.partition.fetch.bytes = 1048576
           max.poll.interval.ms = 300000
           max.poll.records = 500
           metadata.max.age.ms = 300000
           metric.reporters = []
           metrics.num.samples = 2
           metrics.recording.level = INFO
           metrics.sample.window.ms = 30000
           partition.assignment.strategy = [class 
org.apache.kafka.clients.consumer.RangeAssignor]
           receive.buffer.bytes = 65536
           reconnect.backoff.max.ms = 1000
           reconnect.backoff.ms = 50
           request.timeout.ms = 30000
           retry.backoff.ms = 100
           sasl.client.callback.handler.class = null
           sasl.jaas.config = null
           sasl.kerberos.kinit.cmd = /usr/bin/kinit
           sasl.kerberos.min.time.before.relogin = 60000
           sasl.kerberos.service.name = null
           sasl.kerberos.ticket.renew.jitter = 0.05
           sasl.kerberos.ticket.renew.window.factor = 0.8
           sasl.login.callback.handler.class = null
           sasl.login.class = null
           sasl.login.refresh.buffer.seconds = 300
           sasl.login.refresh.min.period.seconds = 60
           sasl.login.refresh.window.factor = 0.8
           sasl.login.refresh.window.jitter = 0.05
           sasl.mechanism = GSSAPI
           security.protocol = PLAINTEXT
           security.providers = null
           send.buffer.bytes = 131072
           session.timeout.ms = 10000
           ssl.cipher.suites = null
           ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
           ssl.endpoint.identification.algorithm = https
           ssl.key.password = null
           ssl.keymanager.algorithm = SunX509
           ssl.keystore.location = null
           ssl.keystore.password = null
           ssl.keystore.type = JKS
           ssl.protocol = TLS
           ssl.provider = null
           ssl.secure.random.implementation = null
           ssl.trustmanager.algorithm = PKIX
           ssl.truststore.location = null
           ssl.truststore.password = null
           ssl.truststore.type = JKS
           value.deserializer = class 
io.confluent.kafka.serializers.KafkaAvroDeserializer
   
   23/03/12 11:40:49 INFO KafkaAvroDeserializerConfig: 
KafkaAvroDeserializerConfig values:
           bearer.auth.token = [hidden]
           schema.registry.url = [http://0.0.0.0:8081]
           basic.auth.user.info = [hidden]
           auto.register.schemas = true
           max.schemas.per.subject = 1000
           basic.auth.credentials.source = URL
           schema.registry.basic.auth.user.info = [hidden]
           bearer.auth.credentials.source = STATIC_TOKEN
           specific.avro.reader = false
           value.subject.name.strategy = class 
io.confluent.kafka.serializers.subject.TopicNameStrategy
           key.subject.name.strategy = class 
io.confluent.kafka.serializers.subject.TopicNameStrategy
   
   23/03/12 11:40:49 WARN ConsumerConfig: The configuration 'validate.non.null' 
was supplied but isn't a known config.
   23/03/12 11:40:49 WARN ConsumerConfig: The configuration 
'hoodie.deltastreamer.source.kafka.value.deserializer.class' was supplied but 
isn't a known config.
   23/03/12 11:40:49 INFO AppInfoParser: Kafka version: 2.4.1
   23/03/12 11:40:49 INFO AppInfoParser: Kafka commitId: c57222ae8cd7866b
   23/03/12 11:40:49 INFO AppInfoParser: Kafka startTimeMs: 1678632049794
   23/03/12 11:40:50 INFO Metadata: [Consumer clientId=consumer-datalake-1, 
groupId=datalake] Cluster ID: 0EpmDQKcTSK7kGDKCviAwQ
   23/03/12 11:40:50 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.200.15:56696) with ID 0,  ResourceProfileId 0
   23/03/12 11:40:50 INFO KafkaOffsetGen: SourceLimit not configured, set 
numEvents to default value : 5000000
   23/03/12 11:40:50 INFO DebeziumSource: About to read 1532804 from Kafka for 
topic :pgprd.public.sbr_ped_venda
   23/03/12 11:40:51 WARN KafkaUtils: overriding enable.auto.commit to false 
for executor
   23/03/12 11:40:51 WARN KafkaUtils: overriding auto.offset.reset to none for 
executor
   23/03/12 11:40:51 WARN KafkaUtils: overriding executor group.id to 
spark-executor-datalake
   23/03/12 11:40:51 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 
see KAFKA-3135
   23/03/12 11:40:51 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.200.15:33183 with 1048.8 MiB RAM, BlockManagerId(0, 0.0.0.0, 33183, 
None)
   23/03/12 11:40:52 INFO SparkContext: Starting job: isEmpty at 
AvroConversionUtils.scala:120
   23/03/12 11:40:52 INFO DAGScheduler: Got job 0 (isEmpty at 
AvroConversionUtils.scala:120) with 1 output partitions
   23/03/12 11:40:52 INFO DAGScheduler: Final stage: ResultStage 0 (isEmpty at 
AvroConversionUtils.scala:120)
   23/03/12 11:40:52 INFO DAGScheduler: Parents of final stage: List()
   23/03/12 11:40:52 INFO DAGScheduler: Missing parents: List()
   23/03/12 11:40:52 INFO DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[1] at map at DebeziumSource.java:159), which has no missing 
parents
   23/03/12 11:40:52 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 6.0 KiB, free 434.4 MiB)
   23/03/12 11:40:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
in memory (estimated size 3.5 KiB, free 434.4 MiB)
   23/03/12 11:40:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on spark-hudi.smartbr.com:44591 (size: 3.5 KiB, free: 434.4 MiB)
   23/03/12 11:40:53 INFO SparkContext: Created broadcast 0 from broadcast at 
DAGScheduler.scala:1513
   23/03/12 11:40:53 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 0 (MapPartitionsRDD[1] at map at DebeziumSource.java:159) (first 15 
tasks are for partitions Vector(0))
   23/03/12 11:40:53 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 
resource profile 0
   23/03/12 11:40:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
0) (0.0.0.0, executor 0, partition 0, PROCESS_LOCAL, 4378 bytes) 
taskResourceAssignments Map()
   23/03/12 11:40:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
on 0.0.0.0:33183 (size: 3.5 KiB, free: 1048.8 MiB)
   23/03/12 11:41:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 
0) in 3868 ms on 0.0.0.0 (executor 0) (1/1)
   23/03/12 11:41:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
have all completed, from pool
   23/03/12 11:41:02 INFO DAGScheduler: ResultStage 0 (isEmpty at 
AvroConversionUtils.scala:120) finished in 9.974 s
   23/03/12 11:41:02 INFO DAGScheduler: Job 0 is finished. Cancelling potential 
speculative or zombie tasks for this job
   23/03/12 11:41:02 INFO TaskSchedulerImpl: Killing all running tasks in stage 
0: Stage finished
   23/03/12 11:41:02 INFO DAGScheduler: Job 0 finished: isEmpty at 
AvroConversionUtils.scala:120, took 10.185954 s
   23/03/12 11:41:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
0.0.0.0:33183 in memory (size: 3.5 KiB, free: 1048.8 MiB)
   23/03/12 11:41:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
spark-hudi.smartbr.com:44591 in memory (size: 3.5 KiB, free: 434.4 MiB)
   23/03/12 11:41:05 WARN package: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxToStringFields'.
   **23/03/12 11:41:07 INFO DebeziumSource: Date fields: []**
   23/03/12 11:41:07 INFO HoodieDeltaStreamer: Delta Sync shutdown. Error ?false
   23/03/12 11:41:07 INFO HoodieDeltaStreamer: DeltaSync shutdown. Closing 
write client. Error?true
   23/03/12 11:41:07 ERROR HoodieAsyncService: Service shutdown with error
   java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
           at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
           at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:195)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:192)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at 
org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42)
           at 
org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
           at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
           at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
           at scala.collection.immutable.List.foldLeft(List.scala:91)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
           at scala.collection.immutable.List.foreach(List.scala:431)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
           at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
           at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
           at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
           at 
org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:173)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:172)
           at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3396)
           at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3394)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.convertColumnToNullable(DebeziumSource.java:220)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.toDataset(DebeziumSource.java:167)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:123)
           at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
           at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
           at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:71)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:530)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716)
           at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   23/03/12 11:41:07 INFO DeltaSync: Shutting down embedded timeline server
   23/03/12 11:41:07 INFO SparkUI: Stopped Spark web UI at 
http://spark-hudi:8090
   23/03/12 11:41:07 INFO StandaloneSchedulerBackend: Shutting down all 
executors
   23/03/12 11:41:07 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking 
each executor to shut down
   23/03/12 11:41:08 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   23/03/12 11:41:08 INFO MemoryStore: MemoryStore cleared
   23/03/12 11:41:08 INFO BlockManager: BlockManager stopped
   23/03/12 11:41:08 INFO BlockManagerMaster: BlockManagerMaster stopped
   23/03/12 11:41:08 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   23/03/12 11:41:08 INFO SparkContext: Successfully stopped SparkContext
   Exception in thread "main" org.apache.hudi.exception.HoodieException: 
java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:197)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:192)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.util.concurrent.ExecutionException: 
java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
           at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
           at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:195)
           ... 15 more
   Caused by: java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at 
org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42)
           at 
org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
           at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
           at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
           at scala.collection.immutable.List.foldLeft(List.scala:91)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
           at scala.collection.immutable.List.foreach(List.scala:431)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
           at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
           at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
           at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
           at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
           at 
org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
           at 
org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
           at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:173)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:172)
           at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3396)
           at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3394)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.convertColumnToNullable(DebeziumSource.java:220)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.toDataset(DebeziumSource.java:167)
           at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:123)
           at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
           at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
           at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:71)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:530)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716)
           at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   23/03/12 11:41:08 INFO ShutdownHookManager: Shutdown hook called
   23/03/12 11:41:08 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-5dccedc5-681f-446c-987b-0e989840983e
   23/03/12 11:41:08 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-c05b2c1a-a206-4c9d-9e2b-c4f414f7f03b
   23/03/12 11:41:08 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
system...
   23/03/12 11:41:08 INFO MetricsSystemImpl: s3a-file-system metrics system 
stopped.
   23/03/12 11:41:08 INFO MetricsSystemImpl: s3a-file-system metrics system 
shutdown complete.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] lenhardtx opened a new issue, #8159: [SUPPORT]

Reply via email to