shubhamn21 opened a new issue, #10127:
URL: https://github.com/apache/hudi/issues/10127

   **Describe the problem you faced**
   
   The hudi job runs fine for an hour but then crashes after a Warning about 
`Clean Action failure` and subsequently raising an exception 
   `org.apache.hudi.exception.HoodieIOException: Could not check if 
s3a://xyz-bucket/spark-warehouse/xyz-table-name/.hoodie/metadata is a valid 
table `
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version : 3.3.0
   
   * Java version : 1.8 
   
   * Storage (HDFS/S3/GCS..) : S3A bucket
   
   * Running on Docker? (yes/no) : Yes, Spark on kubernetes
   
   
   **Stacktrace**
   
   ```23/11/17 04:00:55 INFO HoodieTableMetaClient: Loading 
HoodieTableMetaClient from 
s3a://xyz-bucket/spark-warehouse/xyz_table_name/.hoodie/metadata
   23/11/17 04:00:55 WARN CleanActionExecutor: Failed to perform previous clean 
operation, instant: [==>20231117040045263__clean__REQUESTED]
   org.apache.hudi.exception.HoodieIOException: Could not check if 
s3a://xyz-bucket/spark-warehouse/xyz_table_name/.hoodie/metadata is a valid 
table
        at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:59)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:137)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:689)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:81)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:770)
        at 
org.apache.hudi.common.table.HoodieTableMetaClient.reload(HoodieTableMetaClient.java:174)
        at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:153)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:838)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:918)
        at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$1(BaseActionExecutor.java:68)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:68)
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:221)
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:187)
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.lambda$execute$8(CleanActionExecutor.java:256)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:250)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.clean(HoodieSparkCopyOnWriteTable.java:263)
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:557)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:759)
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:731)
        at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.InterruptedIOException: getFileStatus on 
s3a://xyz-bucket/spark-warehouse/xyz_table_name/.hoodie/metadata/.hoodie: 
com.amazonaws.AbortedException: 
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:395)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3799)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$24(S3AFileSystem.java:3556)
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3554)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
        at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
        at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
        ... 25 more
   Caused by: com.amazonaws.AbortedException: 
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleInterruptedException(AmazonHttpClient.java:868)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:746)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
        at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
        ... 36 more
   Caused by: com.amazonaws.http.timers.client.SdkInterruptedException
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.checkInterrupted(AmazonHttpClient.java:923)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.checkInterrupted(AmazonHttpClient.java:909)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1103)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
        ... 49 more
   23/11/17 04:00:55 INFO SparkUI: Stopped Spark web UI at 
   23/11/17 04:00:55 INFO KubernetesClusterSchedulerBackend: Shutting down all 
executors
   23/11/17 04:00:55 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
   23/11/17 04:00:55 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
has been closed.
   23/11/17 04:00:55 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   23/11/17 04:00:55 INFO MemoryStore: MemoryStore cleared
   23/11/17 04:00:55 INFO BlockManager: BlockManager stopped
   23/11/17 04:00:56 INFO BlockManagerMaster: BlockManagerMaster stopped
   23/11/17 04:00:56 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   23/11/17 04:00:56 INFO SparkContext: Successfully stopped SparkContext```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to