[
https://issues.apache.org/jira/browse/HUDI-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sambhav gupta updated HUDI-4154:
--------------------------------
Description:
When trying to write Hudi Tables into MinIO(S3) via Flink SQL we are facing
issues .
The configuration is as follows:
1) MinIO S3 working on localhost:9000 - Latest docker image
2) Flink 1.13.6
3) Hudi - hudi-flink-bundle_2.11-0.10.1.jar
4) etc/core/site.xml set with S3 properties for access key, secret key and
endpoint already
When we create a MOR Hudi table as follows and try to insert records in it we
face an issue.
> create table t1s3hudi(id int PRIMARY KEY, name varchar(50)) with ('connector'
> = 'hudi', 'path' = 's3a://test123/t1s3hudi', 'table.type' = 'MERGE_ON_READ',
> 'hoodie.aws.access.key' = 'minioadmin', 'hoodie.aws.secret.key' =
> 'minioadmin');
> insert into t1s3hudi values(1,'one number s3');
The exception that we get in error logs is:
*Caused by: org.apache.hudi.exception.HoodieException: Error while checking
whether table exists under path:s3a://test123/t1s3hudi*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:292)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*... 18 more*
*Caused by: java.nio.file.AccessDeniedException:
s3a://test123/t1s3hudi/.hoodie: getFileStatus on
s3a://test123/t1s3hudi/.hoodie:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XAJMZTMQDGHRWZS8;
S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
Proxy: null), S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=:403
Forbidden*
*at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:218)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2184)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*... 18 more*
*Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
XAJMZTMQDGHRWZS8; S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
Proxy: null)*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1338)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1235)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1232)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2169)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
We have tried creating a 'csv' table using the same s3 path and 'filesystem'
connector and we were able to write to MinIO(s3) bucket with the same
credentials provided to the core-site.xml.
Surprising the same above error comes when we are trying to insert records into
Hudi Table even if the MinIO (S3) Server is not running.
was:
When trying to write Hudi Tables into MinIO(S3) via Flink SQL we are facing
issues .
The configuration is as follows:
1) MinIO S3 working on localhost:9000 - Latest docker image
2) Flink 1.13.6
3) Hudi - hudi-flink-bundle_2.11-0.10.1.jar
4) etc/core/site.xml set with S3 properties for access key, secret key and
endpoint already
When we create a MOR Hudi table as follows and try to insert records in it we
face an issue.
> create table t1s3hudi(id int PRIMARY KEY, name varchar(50)) with ('connector'
> = 'hudi', 'path' = 's3a://test123/t1s3hudi', 'table.type' = 'MERGE_ON_READ',
> 'hoodie.aws.access.key' = 'minioadmin', 'hoodie.aws.secret.key' =
> 'minioadmin');
> insert into t1s3hudi values(1,'one number s3');
The exception that we get in error logs is:
*Caused by: org.apache.hudi.exception.HoodieException: Error while checking
whether table exists under path:s3a://test123/t1s3hudi*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:292)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*... 18 more*
*Caused by: java.nio.file.AccessDeniedException:
s3a://test123/t1s3hudi/.hoodie: getFileStatus on
s3a://test123/t1s3hudi/.hoodie:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XAJMZTMQDGHRWZS8;
S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
Proxy: null), S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=:403
Forbidden*
*at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:218)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2184)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*... 18 more*
*Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
XAJMZTMQDGHRWZS8; S3 Extended Request ID:
qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
Proxy: null)*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1338)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1235)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1232)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2169)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
*at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
*at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
*at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
~[flink-dist_2.11-1.13.6.jar:1.13.6]*
We have tried creating a 'csv' table using the same s3 path and 'filesystem'
connector and we were able to write to MinIO(s3) bucket with the same
credentials provided to the core-site.xml.
> Unable to write HUDI Tables to S3 via Flink SQL
> -----------------------------------------------
>
> Key: HUDI-4154
> URL: https://issues.apache.org/jira/browse/HUDI-4154
> Project: Apache Hudi
> Issue Type: Bug
> Components: connectors
> Reporter: sambhav gupta
> Priority: Major
>
> When trying to write Hudi Tables into MinIO(S3) via Flink SQL we are facing
> issues .
> The configuration is as follows:
> 1) MinIO S3 working on localhost:9000 - Latest docker image
> 2) Flink 1.13.6
> 3) Hudi - hudi-flink-bundle_2.11-0.10.1.jar
> 4) etc/core/site.xml set with S3 properties for access key, secret key and
> endpoint already
> When we create a MOR Hudi table as follows and try to insert records in it we
> face an issue.
> > create table t1s3hudi(id int PRIMARY KEY, name varchar(50)) with
> > ('connector' = 'hudi', 'path' = 's3a://test123/t1s3hudi', 'table.type' =
> > 'MERGE_ON_READ', 'hoodie.aws.access.key' = 'minioadmin',
> > 'hoodie.aws.secret.key' = 'minioadmin');
> > insert into t1s3hudi values(1,'one number s3');
>
> The exception that we get in error logs is:
> *Caused by: org.apache.hudi.exception.HoodieException: Error while checking
> whether table exists under path:s3a://test123/t1s3hudi*
> *at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:292)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *... 18 more*
> *Caused by: java.nio.file.AccessDeniedException:
> s3a://test123/t1s3hudi/.hoodie: getFileStatus on
> s3a://test123/t1s3hudi/.hoodie:
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
> XAJMZTMQDGHRWZS8; S3 Extended Request ID:
> qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
> Proxy: null), S3 Extended Request ID:
> qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=:403
> Forbidden*
> *at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:218)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2184)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *... 18 more*
> *Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
> (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
> XAJMZTMQDGHRWZS8; S3 Extended Request ID:
> qaTd5xTZCvnRwThI9fTSeuWVuzXpuw9H6w7roFGBnBVNQmHe1O7mgHbzEZmEIKNp/bx3Iyb9/Kc=;
> Proxy: null)*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1338)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1235)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1232)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2169)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
> ~[flink-s3-fs-hadoop-1.13.6.jar:1.13.6]*
> *at org.apache.hudi.util.StreamerUtil.tableExists(StreamerUtil.java:290)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.util.StreamerUtil.initTableIfNotExists(StreamerUtil.java:258)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:164)
> ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]*
> *at
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:194)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:592)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:955)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.startJobExecution(JobMaster.java:873)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.jobmaster.JobMaster.onStart(JobMaster.java:383)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
> *at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:605)
> ~[flink-dist_2.11-1.13.6.jar:1.13.6]*
>
> We have tried creating a 'csv' table using the same s3 path and 'filesystem'
> connector and we were able to write to MinIO(s3) bucket with the same
> credentials provided to the core-site.xml.
> Surprising the same above error comes when we are trying to insert records
> into Hudi Table even if the MinIO (S3) Server is not running.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)