[ 
https://issues.apache.org/jira/browse/HDDS-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325063#comment-17325063
 ] 

Bharat Viswanadham edited comment on HDDS-5116 at 4/19/21, 2:15 PM:
--------------------------------------------------------------------

Thanks [~adoroszlai] for reporting this.
I think, we might need similar fix as HDDS-5058 where we see similar issue of 
OM not initializing due to unable to getScmInfo.
Can we use similar kind of solution here or any other solution?

cc [~msingh] [~shashikant] [~xyao]




was (Author: bharatviswa):
Thanks [~adoroszlai] for reporting this.
I think, we might need similar fix as HDDS-5058 where we see similar issue of 
OM not initializing due to unable to getScmInfo.
Can we use similar kind of solution here or any other solution?

cc [~msingh] [~shashikant] @xiaoyu



> Secure datanode may exit if cannot connect to SCM
> -------------------------------------------------
>
>                 Key: HDDS-5116
>                 URL: https://issues.apache.org/jira/browse/HDDS-5116
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode, SCM HA, Security
>            Reporter: Attila Doroszlai
>            Assignee: Bharat Viswanadham
>            Priority: Critical
>
> Intermittent failure in secure acceptance tests indicates that datanode may 
> fail to start up if SCM is not yet ready:
> {noformat}
> datanode_3  | STARTUP_MSG: Starting HddsDatanodeService
> ...
> datanode_3  | 2021-04-19 08:20:29,030 [main] INFO ozone.HddsDatanodeService: 
> Creating csr for DN-> subject:dn@627dcb55b990
> ...
> datanode_3  | 2021-04-19 08:20:57,660 [main] INFO 
> retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> java.net.ConnectException: Call From 627dcb55b990/172.26.0.4 to scm:9961 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy18.submitRequest over nodeId=scmNodeId,nodeAddress=scm/172.26.0.10:9961 
> after 14 failover attempts. Trying to failover after sleeping for 2000ms.
> datanode_3  | 2021-04-19 08:20:59,667 [main] ERROR ozone.HddsDatanodeService: 
> Error while storing SCM signed certificate.
> ...
> datanode_3  |         at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:104)
> datanode_3  |         at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:263)
> datanode_3  |         at 
> org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:349)
> datanode_3  |         at 
> org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:320)
> datanode_3  |         at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:248)
> datanode_3  |         at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:192)
> ...
> datanode_3  | SHUTDOWN_MSG: Shutting down HddsDatanodeService at 
> 627dcb55b990/172.26.0.4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to