waterlx commented on issue #1349:
URL: https://github.com/apache/iceberg/issues/1349#issuecomment-676468945
@aokolnychyi I used a Spark job to count the record after writing into the
Iceberg table and it did not show the expected the number. By accident, I found
that the version nuber in version-hint was not the same as the latest metadata
json.
A Flink job is used to write into Iceberg and I use MergeAppend and
Transaction.commitTransaction() to do the commit. The table instance is not
cached.
Sorry that I could not re-create it or recall all actions I made that day. I
guess the incorrect read might be due to that ont only was version-hint not
updated correctly, but at least one metadata json file is also not written
correctly due to the improper permissions.
Another thing I would like to mention (but might not relate to that
incorrect read) is that "java.io.FileNotFoundException" in the description is
the error made on purpose so as to trigger the error in house in my dev
environment, the actual error on our production system is the following
org.apache.hadoop.ipc.RemoteException (we enabled Ranger aganist Hadoop):
```
org.apache.hadoop.ipc.RemoteException(org.apache.ranger.authorization.hadoop.exceptions.RangerAccessControlException):
Permission denied: user=u_teg_tdbank, access=WRITE,
inode="/xxxx/version-hint.text"
at
org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:442)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1663)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1647)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1597)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.resolvePathForStartFile(FSDirWriteFileOp.java:305)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2284)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2227)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createOriginal(NameNodeRpcServer.java:745)
at
org.apache.hadoop.hdfs.server.namenode.ProtectionManager.create(ProtectionManager.java:326)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:715)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:866)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:809)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2248)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2574)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.create(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:307)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:266)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1308)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1249)
at
org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:484)
at
org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:481)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:495)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:422)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:946)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:927)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:824)
at
org.apache.iceberg.hadoop.HadoopTableOperations.writeVersionHint(HadoopTableOperations.java:273)
at
org.apache.iceberg.hadoop.HadoopTableOperations.commit(HadoopTableOperations.java:162)
at
org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:344)
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:403)
at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:188)
at
org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:329)
at
org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:220)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]