sadanand48 opened a new pull request, #4325: URL: https://github.com/apache/ozone/pull/4325
## What changes were proposed in this pull request? On providing an incorrect hostname/service ID in ofs URI , the filesystem client instead of failing , retries till exhaustion. Also the default retry config for client retries is too high currently (500). Considering linear retry policy , it would take the client ((1 + 500) * 500 * 2)/2 = 250500 seconds ~= 70 hours to stop retrying. Reduced retry count from 500 to 45 (referred DfsClient code defaults) ```bash $ ozone fs -ls ofs://ozone2/ 23/02/28 07:02:35 WARN ha.OMProxyInfo: OzoneManager address ozone2:9862 for serviceID null remains unresolved for node ID null Check your ozone-site.xml file to ensure ozone manager addresses are configured properly. 23/02/28 07:02:38 INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: "xxx"; destination host is: "ozone2":9862; java.net.UnknownHostException: Invalid host name: local host is: "ozone"; destination host is: "ozone2":9862; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking $Proxy11.submitRequest over nodeId=null,nodeAddress=ozone2:9862 after 1 failover attempts. Trying to failover after sleeping for 4000ms. 23/02/28 07:02:42 INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: "xxx"; destination host is: "ozone2":9862; java.net.UnknownHostException: Invalid host name: local host is: "ozone"; destination host is: "ozone2":9862; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking $Proxy11.submitRequest over nodeId=null,nodeAddress=ozone2:9862 after 2 failover attempts. Trying to failover after sleeping for 6000ms. 23/02/28 07:02:48 INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: "xxx"; destination host is: "ozone2":9862; java.net.UnknownHostException: Invalid host name: local host is: "ozone"; destination host is: "ozone2":9862; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking $Proxy11.submitRequest over nodeId=null,nodeAddress=ozone2:9862 after 3 failover attempts. Trying to failover after sleeping for 8000ms. ``` Adding a check here which tries to resolve the hostname and if it's not successful then fail the op and do not retry. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-8041 ## How was this patch tested? Unit tests, manually tried. After this PR ```bash $ ozone fs -ls ofs://ozone2/ -ls: Cannot resolve OM host ozone2 in the URI``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
