stym06 opened a new issue #3961:
URL: https://github.com/apache/hudi/issues/3961


   **Describe the problem you faced**
   
   Deltastreamer tool is failing during the clean operation
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.4
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.1.2
   
   * Storage (HDFS/S3/GCS..) : Wasb
   
   * Running on Docker? (yes/no) : K8s
   
   
   **Stacktrace**
   
   ```
   Exception in thread "main" org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: Failed to schedule clean operation
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:161)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:156)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:480)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: Failed to schedule clean operation
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:79)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:159)
        ... 15 more
   Caused by: org.apache.hudi.exception.HoodieException: Failed to schedule 
clean operation
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:614)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieIOException: Failed to schedule 
clean operation
        at 
org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:95)
        at 
org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:107)
        at 
org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.execute(BaseCleanPlanActionExecutor.java:129)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:183)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.scheduleTableServiceInternal(AbstractHoodieWriteClient.java:951)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:642)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:630)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.clean(AbstractHoodieWriteClient.java:661)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.autoCleanOnCommit(AbstractHoodieWriteClient.java:494)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:431)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
        at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:72)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:162)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:472)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:287)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:598)
        ... 4 more
   Caused by: java.io.IOException: Not an Avro data file
        at 
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:178)
        at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieCleanMetadata(TimelineMetadataUtils.java:152)
        at 
org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsForCleanByCommits(CleanPlanner.java:150)
        at 
org.apache.hudi.table.action.clean.CleanPlanner.getPartitionPathsToClean(CleanPlanner.java:126)
        at 
org.apache.hudi.table.action.clean.BaseCleanPlanActionExecutor.requestClean(BaseCleanPlanActionExecutor.java:73)
        ... 20 more
   95.259: Application time: 0.0018910 seconds
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to