[
https://issues.apache.org/jira/browse/HDDS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen updated HDDS-6685:
-----------------------------
Description:
The root cause of this issue is omRpcServer in OzoneManager holds a reference
to delegationTokenMgr.
Every incoming RPC request, omRpcServer will call delegationTokenMrg to
validate the S3AuthInfo(All customer data are ingested using S3G) before
checking the leadership of this OM instance.
During installCheckpoint, new metadataManager and delegationTokenMgr instances
are created while omRpcServer still hold the old delegationTokenMgr reference.
So to make a clean context, the solution might be stop the omRpcServer before
the metadataManager stop. After checkpoint is installed, recreate
metadataManager, delegationTokenMgr and then start a new omRpcServer server.
But this solution will cause IOException when leader OM or client(S3G) tries to
send requests to this OM. Not sure how big the impact will be.
[^hs_err_pid716.log]
was:[^hs_err_pid716.log]
> Follower OM crashed when validating S3 auth info.
> -------------------------------------------------
>
> Key: HDDS-6685
> URL: https://issues.apache.org/jira/browse/HDDS-6685
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
> Attachments: hs_err_pid716.log
>
>
> The root cause of this issue is omRpcServer in OzoneManager holds a reference
> to delegationTokenMgr.
> Every incoming RPC request, omRpcServer will call delegationTokenMrg to
> validate the S3AuthInfo(All customer data are ingested using S3G) before
> checking the leadership of this OM instance.
> During installCheckpoint, new metadataManager and delegationTokenMgr
> instances are created while omRpcServer still hold the old delegationTokenMgr
> reference.
>
> So to make a clean context, the solution might be stop the omRpcServer
> before the metadataManager stop. After checkpoint is installed, recreate
> metadataManager, delegationTokenMgr and then start a new omRpcServer server.
> But this solution will cause IOException when leader OM or client(S3G) tries
> to send requests to this OM. Not sure how big the impact will be.
>
>
>
> [^hs_err_pid716.log]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]