bharatviswa504 opened a new pull request #2162:
URL: https://github.com/apache/ozone/pull/2162


   ## What changes were proposed in this pull request?
   
   Following changes are done:
   1. For Datanode used max retryCount so that Datanode will retry for ever 
during startup to get Signed Cert from SCM.
   2. For OM/SCM used fixed duration to give response to end-user performing 
init/bootstrap.
   3. Updated to use max retryCount for fetching CAList which is required 
during DN/OM startup.
   4. Updated to use max retry count for get certificate From SCM which is used 
in BlockToken Verification/OMToken Verification when cert is not there in its 
local cache.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5116
   
   ## How was this patch tested?
   
   Tested manually, started OM/DN before SCM Startup and they are retrying more 
than default 15 retry count.
   
   ```om1_1        | 2021-04-20 07:15:09,675 [main] INFO 
retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
java.net.NoRouteToHostException: No Route to Host from  om1/172.25.0.111 to 
scm1.org:9863 failed on socket timeout exception: 
java.net.NoRouteToHostException: No route to host; For more details see:  
http://wiki.apache.org/hadoop/NoRouteToHost, while invoking $Proxy31.send over 
nodeId=scm1,nodeAddress=scm1.org/172.25.0.116:9863 after 45 failover attempts. 
Trying to failover after sleeping for 2000ms.
   ````
   
   ```
   datanode1_1  | 2021-04-20 07:15:35,048 [main] INFO 
retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
java.net.ConnectException: Call From 9cb343c107ed/172.25.0.102 to scm3.org:9961 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while 
invoking $Proxy17.submitRequest over 
nodeId=scm3,nodeAddress=scm3.org/172.25.0.118:9961 after 35 failover attempts. 
Trying to failover after sleeping for 2000ms.
   ```
   
   And once SCM is booted up DN and OM are able to successfully startup.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to