[ 
https://issues.apache.org/jira/browse/HDDS-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-6685:
-----------------------------
    Description: 
The root cause of this issue is omRpcServer in OzoneManager holds a reference 
to delegationTokenMgr. 

Every incoming RPC request, omRpcServer will call delegationTokenMrg to 
validate the S3AuthInfo(All customer data are ingested using S3G) before 
checking the leadership of this OM instance.

During installCheckpoint, new metadataManager and delegationTokenMgr instances 
are created while omRpcServer still hold the old delegationTokenMgr reference.

 

So to make a clean context,  the solution might be stop the omRpcServer before 
the metadataManager stop. After checkpoint is installed, recreate 
metadataManager, delegationTokenMgr and then start a new omRpcServer server.  
But this solution will cause IOException when leader OM or client(S3G) tries to 
send requests to this OM. Not sure how big the impact will be.

 

 

 

[^hs_err_pid716.log]

  was:[^hs_err_pid716.log]


> Follower OM crashed when validating S3 auth info.
> -------------------------------------------------
>
>                 Key: HDDS-6685
>                 URL: https://issues.apache.org/jira/browse/HDDS-6685
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: hs_err_pid716.log
>
>
> The root cause of this issue is omRpcServer in OzoneManager holds a reference 
> to delegationTokenMgr. 
> Every incoming RPC request, omRpcServer will call delegationTokenMrg to 
> validate the S3AuthInfo(All customer data are ingested using S3G) before 
> checking the leadership of this OM instance.
> During installCheckpoint, new metadataManager and delegationTokenMgr 
> instances are created while omRpcServer still hold the old delegationTokenMgr 
> reference.
>  
> So to make a clean context,  the solution might be stop the omRpcServer 
> before the metadataManager stop. After checkpoint is installed, recreate 
> metadataManager, delegationTokenMgr and then start a new omRpcServer server.  
> But this solution will cause IOException when leader OM or client(S3G) tries 
> to send requests to this OM. Not sure how big the impact will be.
>  
>  
>  
> [^hs_err_pid716.log]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to