ROOBALJINDAL opened a new issue, #6348: URL: https://github.com/apache/hudi/issues/6348
**Environment:** AWS EMR Cluster: 'emr-6.7.0' version Hudi - 0.11.0 Hive - 3.1.3 Hadoop: 3.2.1 Spark: 3.2.1 **Steps to reproduce the behavior:** I am running Multi table streamer and as of now I am running it for 1 table only. We are using SQL server for CDC and debezium connector to push into Kafka and then reading kafka to pull records into hudi/hive tables. **kafka-source.properties** ``` hoodie.deltastreamer.ingestion.tablesToBeIngested=default.rrmencounter hoodie.deltastreamer.ingestion.default.rrmencounter.configFile=s3://hudi-multistreamer-roobal/hudi-ingestion-config/rrmencounter-config.properties auto.offset.reset=earliest hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer schema.registry.url=http://10.151.46.161:8080/apis/ccompat/v6 bootstrap.servers=b-2.rjtest12mskcluster.sx7vgn.c13.kafka.us-west-2.amazonaws.com:9092,b-1.rjtest12mskcluster.sx7vgn.c13.kafka.us-west-2.amazonaws.com:9092 ``` **rrmencounter-config.properties** ``` hoodie.datasource.write.recordkey.field=rrmencountersid hoodie.datasource.write.partitionpath.field=receiptdt hoodie.deltastreamer.ingestion.targetBasePath=s3://hudi-multistreamer-roobal/hudi/rrmencounter hoodie.datasource.hive_sync.database=default hoodie.datasource.hive_sync.table=rrmencounter hoodie.datasource.hive_sync.partition_fields=receiptdt hoodie.datasource.hive_sync.support_timestamp=true hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd hoodie.deltastreamer.keygen.timebased.timezone=GMT+8:00 hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true hoodie.datasource.write.hive_style_partitioning=true hoodie.deltastreamer.source.kafka.topic=ROOBJIN-LW13206.dbo.rrmencounter hoodie.deltastreamer.schemaprovider.registry.url=http://10.151.46.161:8080/apis/ccompat/v6/subjects/ROOBJIN-LW13206.dbo.rrmencounter-value/versions/latest hoodie.deltastreamer.schemaprovider.registry.targetUrl=http://10.151.46.161:8080/apis/ccompat/v6/subjects/ROOBJIN-LW13206.dbo.rrmencounter-value/versions/latest ``` **Spark command:** ``` spark-submit \ --jars "/usr/lib/spark/external/lib/spark-avro.jar" \ --master yarn --deploy-mode client \ --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer s3://slava-redshift-test/hudi/hudi-utilities-bundle_2.12-0.11.0_edfx.jar \ --props s3://hudi-multistreamer-roobal/hudi-ingestion-config/kafka-source.properties \ --config-folder s3://hudi-multistreamer-roobal/hudi-ingestion-config/ \ --payload-class org.apache.hudi.common.model.debezium.MssqlDebeziumAvroPayload \ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \ --source-class org.apache.hudi.utilities.sources.debezium.MssqlDebeziumSource \ --source-ordering-field _event_lsn \ --min-sync-interval-seconds 60 \ --enable-hive-sync \ --table-type COPY_ON_WRITE \ --base-path-prefix s3:///hudi-multistreamer-roobal/hudi \ --target-table rrmencounter --continuous \ --op UPSERT ``` **Output**: It creates multilevel directories based on partition like 2022 / 06 / 11 but neither hive table nor parquet data file is getting generated. **Please note:** HoodieDeltaStreamer is working fine for same table with following spark command but HoodieMultiTableDeltaStreamer is not working. May be I am missing something **Following works fine for HoodieDeltaStreamer :** ``` spark-submit \ --jars "/usr/lib/spark/external/lib/spark-avro.jar" \ --master yarn --deploy-mode client \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer s3://slava-redshift-test/hudi/hudi-utilities-bundle_2.12-0.11.0_edfx.jar \ --table-type COPY_ON_WRITE --op UPSERT \ --target-base-path s3://hudi-multistreamer-roobal/hudi/rrmencounter \ --target-table rrmencounter --continuous \ --min-sync-interval-seconds 60 \ --source-class org.apache.hudi.utilities.sources.debezium.MssqlDebeziumSource \ --source-ordering-field _event_lsn \ --payload-class org.apache.hudi.common.model.debezium.MssqlDebeziumAvroPayload \ --hoodie-conf schema.registry.url=http://10.151.46.161:8080/apis/ccompat/v6 \ --hoodie-conf hoodie.deltastreamer.schemaprovider.registry.url=http://10.151.46.161:8080/apis/ccompat/v6/subjects/ROOBJIN-LW13206.dbo.rrmencounter-value/versions/latest \ --hoodie-conf hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer \ --hoodie-conf hoodie.deltastreamer.source.kafka.topic=ROOBJIN-LW13206.dbo.rrmencounter \ --hoodie-conf auto.offset.reset=earliest \ --hoodie-conf hoodie.datasource.write.recordkey.field=rrmencountersid \ --hoodie-conf hoodie.datasource.write.partitionpath.field=receiptdt \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd \ --hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT+8:00 \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator \ --hoodie-conf hoodie.datasource.hive_sync.support_timestamp=true \ --enable-hive-sync \ --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \ --hoodie-conf hoodie.datasource.hive_sync.database=default \ --hoodie-conf hoodie.datasource.hive_sync.table=rrmencounter \ --hoodie-conf hoodie.datasource.hive_sync.partition_fields=receiptdt \ --hoodie-conf hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true \ --hoodie-conf bootstrap.servers=b-2.rjtest12mskcluster.sx7vgn.c13.kafka.us-west-2.amazonaws.com:9092,b-1.rjtest12mskcluster.sx7vgn.c13.kafka.us-west-2.amazonaws.com:9092 ``` **Stacktrace** 22/08/09 11:21:53 INFO SparkContext: Starting job: collect at SparkHoodieBackedTableMetadataWriter.java:166 22/08/09 11:21:53 INFO DAGScheduler: Job 21 finished: collect at SparkHoodieBackedTableMetadataWriter.java:166, took 0.000510 s 22/08/09 11:21:53 INFO MultipartUploadOutputStream: close closed:false s3://hudi-multistreamer-roobal/hudi/rrmencounter/.hoodie/20220809112149403.rollback 22/08/09 11:21:53 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:151 22/08/09 11:21:53 INFO DAGScheduler: Got job 22 (collectAsMap at HoodieSparkEngineContext.java:151) with 2 output partitions 22/08/09 11:21:53 INFO DAGScheduler: Final stage: ResultStage 51 (collectAsMap at HoodieSparkEngineContext.java:151) 22/08/09 11:21:53 INFO DAGScheduler: Parents of final stage: List() 22/08/09 11:21:53 INFO DAGScheduler: Missing parents: List() 22/08/09 11:21:53 INFO DAGScheduler: Submitting ResultStage 51 (MapPartitionsRDD[100] at mapToPair at HoodieSparkEngineContext.java:148), which has no missing parents 22/08/09 11:21:53 INFO MemoryStore: Block broadcast_29 stored as values in memory (estimated size 115.5 KiB, free 911.4 MiB) 22/08/09 11:21:53 INFO MemoryStore: Block broadcast_29_piece0 stored as bytes in memory (estimated size 41.1 KiB, free 911.4 MiB) 22/08/09 11:21:53 INFO BlockManagerInfo: Added broadcast_29_piece0 in memory on ip-10-151-46-163.us-west-2.compute.internal:40011 (size: 41.1 KiB, free: 912.2 MiB) 22/08/09 11:21:53 INFO SparkContext: Created broadcast 29 from broadcast at DAGScheduler.scala:1518 22/08/09 11:21:53 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 51 (MapPartitionsRDD[100] at mapToPair at HoodieSparkEngineContext.java:148) (first 15 tasks are for partitions Vector(0, 1)) 22/08/09 11:21:53 INFO YarnScheduler: Adding task set 51.0 with 2 tasks resource profile 0 22/08/09 11:21:53 INFO TaskSetManager: Starting task 0.0 in stage 51.0 (TID 2020) (ip-10-151-46-136.us-west-2.compute.internal, executor 1, partition 0, PROCESS_LOCAL, 4440 bytes) taskResourceAssignments Map() 22/08/09 11:21:53 INFO TaskSetManager: Starting task 1.0 in stage 51.0 (TID 2021) (ip-10-151-46-136.us-west-2.compute.internal, executor 1, partition 1, PROCESS_LOCAL, 4436 bytes) taskResourceAssignments Map() 22/08/09 11:21:53 INFO BlockManagerInfo: Added broadcast_29_piece0 in memory on ip-10-151-46-136.us-west-2.compute.internal:34971 (size: 41.1 KiB, free: 1846.7 MiB) 22/08/09 11:21:53 INFO TaskSetManager: Finished task 1.0 in stage 51.0 (TID 2021) in 52 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (1/2) 22/08/09 11:21:53 INFO TaskSetManager: Finished task 0.0 in stage 51.0 (TID 2020) in 91 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (2/2) 22/08/09 11:21:53 INFO YarnScheduler: Removed TaskSet 51.0, whose tasks have all completed, from pool 22/08/09 11:21:53 INFO DAGScheduler: ResultStage 51 (collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.101 s 22/08/09 11:21:53 INFO DAGScheduler: Job 22 is finished. Cancelling potential speculative or zombie tasks for this job 22/08/09 11:21:53 INFO YarnScheduler: Killing all running tasks in stage 51: Stage finished 22/08/09 11:21:53 INFO DAGScheduler: Job 22 finished: collectAsMap at HoodieSparkEngineContext.java:151, took 0.105640 s 22/08/09 11:21:53 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 22/08/09 11:21:53 ERROR HoodieAsyncService: Service shutdown with error java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:189) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:186) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.sync(HoodieMultiTableDeltaStreamer.java:416) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:709) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) ... 4 more 22/08/09 11:21:53 ERROR HoodieMultiTableDeltaStreamer: error while running MultiTableDeltaStreamer for table: rrmencounter org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:191) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:186) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.sync(HoodieMultiTableDeltaStreamer.java:416) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:189) ... 16 more Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:709) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) ... 4 more 22/08/09 11:21:53 INFO Javalin: Stopping Javalin ... 22/08/09 11:21:53 INFO AbstractConnector: Stopped Spark@64387c17{HTTP/1.1, (http/1.1)}{0.0.0.0:8090} 22/08/09 11:21:53 ERROR Javalin: Javalin failed to stop gracefully java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.hudi.org.eclipse.jetty.server.AbstractConnector.doStop(AbstractConnector.java:333) at org.apache.hudi.org.eclipse.jetty.server.AbstractNetworkConnector.doStop(AbstractNetworkConnector.java:88) at org.apache.hudi.org.eclipse.jetty.server.ServerConnector.doStop(ServerConnector.java:248) at org.apache.hudi.org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89) at org.apache.hudi.org.eclipse.jetty.server.Server.doStop(Server.java:450) at org.apache.hudi.org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89) at io.javalin.Javalin.stop(Javalin.java:195) at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:334) at org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:137) at org.apache.hudi.utilities.deltastreamer.DeltaSync.close(DeltaSync.java:876) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.close(HoodieDeltaStreamer.java:811) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.onDeltaSyncShutdown(HoodieDeltaStreamer.java:222) at org.apache.hudi.async.HoodieAsyncService.lambda$shutdownCallback$0(HoodieAsyncService.java:171) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 22/08/09 11:21:53 INFO Javalin: Javalin has stopped 22/08/09 11:21:53 INFO SparkUI: Stopped Spark web UI at http://ip-10-151-46-163.us-west-2.compute.internal:8090 22/08/09 11:21:53 INFO YarnClientSchedulerBackend: Interrupting monitor thread 22/08/09 11:21:53 INFO YarnClientSchedulerBackend: Shutting down all executors 22/08/09 11:21:53 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 22/08/09 11:21:53 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped 22/08/09 11:21:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/08/09 11:21:53 INFO MemoryStore: MemoryStore cleared 22/08/09 11:21:53 INFO BlockManager: BlockManager stopped 22/08/09 11:21:53 INFO BlockManagerMaster: BlockManagerMaster stopped 22/08/09 11:21:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/08/09 11:21:53 INFO SparkContext: Successfully stopped SparkContext 22/08/09 11:21:53 INFO ShutdownHookManager: Shutdown hook called 22/08/09 11:21:53 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-1ac0b680-8bcb-46e4-89f7-3da932646847 22/08/09 11:21:53 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-2b324903-b982-45d3-944a-647af07783f9 ```Add the stacktrace of the error.``` org.apache.hudi.exception.HoodieException: Commit 20220809112130103 failed and rolled-back ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
