koldic opened a new issue, #7209:
URL: https://github.com/apache/hudi/issues/7209
**Describe the problem you faced**
Hudi deltastreamer fails with this exception `Could not deserialize metadata
of type class org.apache.hudi.avro.model.HoodieCleanMetadata`
**To Reproduce**
Unknown
**Expected behavior**
Hoodie will finish cleaning and run correctly again
**Environment Description**
* Hudi version : 0.12.1
* Spark version : 2.4.8
* Hadoop version : 3.1.4.0-315
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : NO
**Additional context**
deltastreamer run for few days and worked correctly, sometimes it failed due
to insufficient memory, was rollbacked and then worked fine again. Suddenly it
failed (during) the weekend and after re-running it again it ended always with
this exception.
```
Settings:
hoodie.upsert.shuffle.parallelism=
200
hoodie.clean.automatic=
true
hoodie.clean.async=
true
hoodie.metadata.enable=
false
hoodie.write.markers.type=
direct
# Key fields
hoodie.datasource.write.keygenerator.class=
cz.seznam.datacollect.hit.app.sync.keygen.AugmentedCustomKeyGenerator
hoodie.datasource.write.recordkey.field=
random
hoodie.datasource.write.partitionpath.field=
create_tst:timestamp,app:hivestyle,os:hivestyle
hoodie.deltastreamer.keygen.timebased.timestamp.type=
DATE_STRING
hoodie.deltastreamer.keygen.timebased.input.dateformat=
yyyyMMdd'T'HH:mm:ss.SSS
hoodie.deltastreamer.keygen.timebased.timezone=
UTC
hoodie.deltastreamer.keygen.timebased.output.dateformat=
'year='yyyy/'month='MM/'week='ww/'day='dd/'hour='HH
hoodie.deltastreamer.schemaprovider.source.schema.file=
{{ env ("HITS_APP_SCHEMA_PATH") }}
hoodie.deltastreamer.kafka.source.maxEvents=
1000000
```
**Stacktrace**
```
2022-11-15 12:32:28,267 INFO scheduler.TaskSetManager: Starting task 0.0 in
stage 32.0 (TID 14326, cider81.ng.seznam.cz, executor 44, partition 0,
PROCESS_LOCAL, 7791 bytes)
2022-11-15 12:32:28,274 INFO storage.BlockManagerInfo: Added
broadcast_20_piece0 in memory on cider81.ng.seznam.cz:8496 (size: 36.5 KB,
free: 5.2 GB)
2022-11-15 12:32:28,300 INFO scheduler.TaskSetManager: Finished task 0.0 in
stage 32.0 (TID 14326) in 34 ms on cider81.ng.seznam.cz (executor 44) (1/1)
2022-11-15 12:32:28,300 INFO cluster.YarnClusterScheduler: Removed TaskSet
32.0, whose tasks have all completed, from pool
2022-11-15 12:32:28,300 INFO scheduler.DAGScheduler: ResultStage 32
(collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.052 s
2022-11-15 12:32:28,300 INFO scheduler.DAGScheduler: Job 12 finished:
collectAsMap at HoodieSparkEngineContext.java:151, took 0.053427 s
2022-11-15 12:32:28,321 INFO fs.FSUtils: Removed directory at
/hits/app/hudi_cileni/.hoodie/.temp/20221115122551524
2022-11-15 12:32:28,321 INFO client.BaseHoodieWriteClient: Async cleaner has
been spawned. Waiting for it to finish
2022-11-15 12:32:28,321 INFO async.AsyncCleanerService: Waiting for async
clean service to finish
2022-11-15 12:32:28,326 ERROR async.HoodieAsyncService: Service shutdown
with error
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException:
Could not deserialize metadata of type class
org.apache.hudi.avro.model.HoodieCleanMetadata
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
at
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
at
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:609)
at
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:533)
at
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:237)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:626)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:336)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:704)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Could not deserialize
metadata of type class org.apache.hudi.avro.model.HoodieCleanMetadata
at
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:205)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.getCommitsSinceLastCleaning(CleanPlanActionExecutor.java:72)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.needsCleaning(CleanPlanActionExecutor.java:87)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:169)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:204)
at
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1354)
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:827)
at
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
... 4 more
2022-11-15 12:32:28,333 ERROR deltastreamer.HoodieDeltaStreamer: Shutting
down delta-sync due to exception
org.apache.hudi.exception.HoodieException: Error waiting for async clean
service to finish
at
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:77)
at
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:609)
at
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:533)
at
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:237)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:626)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:336)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:704)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException: Could not deserialize metadata of type
class org.apache.hudi.avro.model.HoodieCleanMetadata
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
at
org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75)
... 11 more
Caused by: java.lang.IllegalArgumentException: Could not deserialize
metadata of type class org.apache.hudi.avro.model.HoodieCleanMetadata
at
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:205)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:170)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.getCommitsSinceLastCleaning(CleanPlanActionExecutor.java:72)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.needsCleaning(CleanPlanActionExecutor.java:87)
at
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:169)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:204)
at
org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1354)
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
at
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:827)
at
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
... 4 more
2022-11-15 12:32:28,333 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync
shutdown. Error ?true
2022-11-15 12:32:28,333 WARN deltastreamer.HoodieDeltaStreamer: Gracefully
shutting down compactor
2022-11-15 12:32:30,964 INFO async.AsyncCompactService: Compactor shutting
down properly!!
2022-11-15 12:32:30,966 INFO deltastreamer.HoodieDeltaStreamer: DeltaSync
shutdown. Closing write client. Error?true
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]