[ 
https://issues.apache.org/jira/browse/KNOX-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605758#comment-16605758
 ] 

Matthew Sharp commented on KNOX-1436:
-------------------------------------

The attached patch will now output the initial failed host which is helpful for 
troubleshooting, and also wait until markFailedURL is called to log the next 
host it is failing over to.

Example:

 
{code:java}
2018-09-06 07:45:25,197 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:45:25,198 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failed to connect to: 
http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:26,200 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(134)) - Failing over request to a 
different server: 
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:26,294 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:45:26,295 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failed to connect to: 
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:27,296 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(134)) - Failing over request to a 
different server: 
http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:27,316 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:45:27,316 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failed to connect to: 
http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:28,317 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(134)) - Failing over request to a 
different server: 
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:28,326 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:45:28,326 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failed to connect to: 
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:45:28,326 INFO knox.gateway 
(AbstractHdfsHaDispatch.java:failoverRequest(137)) - Maximum attempts 3 to 
failover reached for service: WEBHDFS
{code}

> AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
> -----------------------------------------------------------------
>
>                 Key: KNOX-1436
>                 URL: https://issues.apache.org/jira/browse/KNOX-1436
>             Project: Apache Knox
>          Issue Type: Bug
>            Reporter: Matthew Sharp
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: KNOX-1436.patch
>
>
> The current WebHDFS failoverRequest method makes it a bit difficult to track 
> which host it failed on vs. which it is retrying next. 
> Example:
> {code:java}
> 2018-09-06 07:49:07,245 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:07,246 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:08,278 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:08,279 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,366 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:10,367 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,368 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to 
> failover reached for service: WEBHDFS
> {code}
> In the example above, host1.example.com already failed initially and the 
> message states failing over to a different host with host1.example.com still.
> Suggestion:
> The HaDispatchMessages for failingOverRequest should be moved down below the 
> markFailedURL call, so it is actually returning the next URI it is trying to 
> failover to (not the current it already failed on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to