Pratyush Bhatt created HDDS-10734:
-------------------------------------
Summary: [Hbase Ozone] ImportTSV fails during OM Rolling Restart
with "SecretManager$InvalidToken: Tampered/Invalid token."
Key: HDDS-10734
URL: https://issues.apache.org/jira/browse/HDDS-10734
Project: Apache Ozone
Issue Type: Bug
Components: OM
Reporter: Pratyush Bhatt
Triggering ImportTSV during Rolling restart is failing.
Debugged the issue, and its reproducible everytime when the "reducers" are
getting used by ImportTSV and at the same time there is a OM rolling restart
stage going on.
{code:java}
2024-04-22 10:16:29,396|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:16:29 INFO
mapreduce.Job: Task Id : attempt_1713778160624_0007_r_000072_0, Status : FAILED
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Error:
org.apache.hadoop.security.token.SecretManager$InvalidToken: Tampered/Invalid
token.
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:253)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:115)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:201)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:111)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:189)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:269)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
java.security.AccessController.doPrivileged(Native Method)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
javax.security.auth.Subject.doAs(Subject.java:422)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
Tampered/Invalid token.
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1616)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.Client.call(Client.java:1562)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.Client.call(Client.java:1459)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
com.sun.proxy.$Proxy17.submitRequest(Unknown Source)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 -
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at
java.lang.reflect.Method.invoke(Method.java:498) {code}
Checked the leader OM logs, shows below:
{code:java}
2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.133.64:46032:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.68.1:43592:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.170.2:41974:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.133.64:46020:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.11.131:50274:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,675 WARN [Socket Reader #1 for port
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
10.140.11.131:50290:null (DIGEST-MD5: IO error acquiring password) with true
cause: (OM:om102 is not the leader. Could not determine the leader node.) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]