Pratyush Bhatt created HDDS-10734:
-------------------------------------

             Summary: [Hbase Ozone] ImportTSV fails during OM Rolling Restart 
with "SecretManager$InvalidToken: Tampered/Invalid token."
                 Key: HDDS-10734
                 URL: https://issues.apache.org/jira/browse/HDDS-10734
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM
            Reporter: Pratyush Bhatt


Triggering ImportTSV during Rolling restart is failing.

Debugged the issue, and its reproducible everytime when the "reducers" are 
getting used by ImportTSV and at the same time there is a OM rolling restart 
stage going on.
{code:java}
2024-04-22 10:16:29,396|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:16:29 INFO 
mapreduce.Job: Task Id : attempt_1713778160624_0007_r_000072_0, Status : FAILED
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Error: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: Tampered/Invalid 
token.
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:253)
2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:115)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:201)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:111)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:189)
2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521)
2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:269)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
java.security.AccessController.doPrivileged(Native Method)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
javax.security.auth.Subject.doAs(Subject.java:422)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Tampered/Invalid token.
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1616)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.Client.call(Client.java:1562)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.Client.call(Client.java:1459)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
com.sun.proxy.$Proxy17.submitRequest(Unknown Source)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - 
run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at 
java.lang.reflect.Method.invoke(Method.java:498) {code}
Checked the leader OM logs, shows below:
{code:java}
2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.133.64:46032:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.68.1:43592:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.170.2:41974:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.133.64:46020:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.11.131:50274:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.)
2024-04-22 10:16:24,675 WARN [Socket Reader #1 for port 
9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 
10.140.11.131:50290:null (DIGEST-MD5: IO error acquiring password) with true 
cause: (OM:om102 is not the leader. Could not determine the leader node.) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to