[
https://issues.apache.org/jira/browse/HDDS-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng reassigned HDDS-10734:
---------------------------------
Assignee: Siyao Meng
> [Hbase Ozone] ImportTSV fails during OM Rolling Restart with
> "SecretManager$InvalidToken: Tampered/Invalid token."
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-10734
> URL: https://issues.apache.org/jira/browse/HDDS-10734
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Reporter: Pratyush Bhatt
> Assignee: Siyao Meng
> Priority: Major
>
> Triggering ImportTSV during Rolling restart is failing.
> Debugged the issue, and its reproducible everytime when the "reducers" are
> getting used by ImportTSV and at the same time there is a OM rolling restart
> stage going on.
> {code:java}
> 2024-04-22 10:15:41,159|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:41 INFO
> mapreduce.Job: map 100% reduce 69%
> 2024-04-22 10:15:43,169|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:43 INFO
> mapreduce.Job: map 100% reduce 70%
> 2024-04-22 10:15:49,198|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:49 INFO
> mapreduce.Job: map 100% reduce 71%
> 2024-04-22 10:16:29,396|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:16:29 INFO
> mapreduce.Job: Task Id : attempt_1713778160624_0007_r_000072_0, Status :
> FAILED
> 2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Error:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Tampered/Invalid
> token.
> 2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> 2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
> 2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:253)
> 2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:115)
> 2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:201)
> 2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51)
> 2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:111)
> 2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:189)
> 2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451)
> 2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161)
> 2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
> 2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503)
> 2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521)
> 2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:269)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> java.security.AccessController.doPrivileged(Native Method)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> javax.security.auth.Subject.doAs(Subject.java:422)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> Tampered/Invalid token.
> 2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1616)
> 2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.Client.call(Client.java:1562)
> 2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.Client.call(Client.java:1459)
> 2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> 2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> 2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> com.sun.proxy.$Proxy17.submitRequest(Unknown Source)
> 2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
> 2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
> run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
> java.lang.reflect.Method.invoke(Method.java:498) {code}
> Checked the leader OM logs, shows below:
> {code:java}
> 2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.133.64:46032:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> 2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.68.1:43592:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> 2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.170.2:41974:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> 2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.133.64:46020:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> 2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.11.131:50274:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> 2024-04-22 10:16:24,675 WARN [Socket Reader #1 for port
> 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
> 10.140.11.131:50290:null (DIGEST-MD5: IO error acquiring password) with true
> cause: (OM:om102 is not the leader. Could not determine the leader node.)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]