ChiehFu opened a new issue, #11036:
URL: https://github.com/apache/hudi/issues/11036

   **Describe the problem you faced**
   
   Hi, 
   
   I was creating a Flink SQL stream pipeline in AWS EMR to compact data into a 
Hudi COW table. Because of S3 slowdown errors that occasionally happened during 
Hudi writes, I tried to turn on the metadata table to eliminate S3 file listing 
but ran into the following exception saying S3 FS doesn't support atomic 
creation.
   
   This issue seems particular related to Flink as I have another Spark/Hudi 
based batch pipeline running in the same type of EMR cluster, and Hudi metadata 
table functionality is working as expected with S3 FS.
   
   Can you please help me with this issue? 
   
   Exception: 
   ```
   Caused by: org.apache.hudi.exception.HoodieLockException: Unsupported scheme 
:s3, since this fs can not support atomic creation
        at 
org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.<init>(FileSystemBasedLockProvider.java:89)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_402]
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_402]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_402]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_402]
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:73) 
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:123) 
~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:118)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:109)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.client.HoodieFlinkTableServiceClient.initMetadataTable(HoodieFlinkTableServiceClient.java:216)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.initMetadataTable(HoodieFlinkWriteClient.java:318)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.initMetadataTable(StreamWriteOperatorCoordinator.java:347)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:192)
 ~[hudi-flink1.17-bundle-0.14.1.jar:0.14.1]
        at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:181)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:165)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:82)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:618)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:1130)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:1047)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:439) 
~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:198)
 ~[flink-dist-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.lambda$start$0(AkkaRpcActor.java:622)
 ~[flink-rpc-akka_f5fd373f-2282-403d-a522-72b822a720aa.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-rpc-akka_f5fd373f-2282-403d-a522-72b822a720aa.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:621)
 ~[flink-rpc-akka_f5fd373f-2282-403d-a522-72b822a720aa.jar:1.17.1-amzn-1]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:190)
 ~[flink-rpc-akka_f5fd373f-2282-403d-a522-72b822a720aa.jar:1.17.1-amzn-1]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) ~[?:?]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) ~[?:?]
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
~[flink-scala_2.12-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
~[flink-scala_2.12-1.17.1-amzn-1.jar:1.17.1-amzn-1]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) 
~[?:?]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
~[flink-scala_2.12-1.17.1-amzn-1.jar:1.17.1-amzn-1]
   ```
   
   **To Reproduce**
   
   Flink SQL / Hudi configs
   
   ```
    'path'='...',
    'connector' = 'hudi',
    'table.type' = 'COPY_ON_WRITE',
    'precombine.field' = 'integ_key',
    'write.precombine' = 'true',
    'write.bucket_assign.tasks' = '10',
    'write.task.max.size' = '2014',
    'write.operation' = 'upsert',
    'hoodie.datasource.write.recordkey.field' = 'key',
    'write.parquet.max.file.size' = '240',
    'index.bootstrap.enabled' = 'false',
    'write.index_bootstrap.tasks' = '200',
    'metadata.enabled' = 'true'
   ```
   
   **Environment Description**
   * Hudi version : 0.14.1
    
   * EMR: 6.15.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.6
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Flink: 1.17.1 
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to