[ 
https://issues.apache.org/jira/browse/KNOX-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep More updated KNOX-1436:
-------------------------------
    Status: Patch Available  (was: Open)

> AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
> -----------------------------------------------------------------
>
>                 Key: KNOX-1436
>                 URL: https://issues.apache.org/jira/browse/KNOX-1436
>             Project: Apache Knox
>          Issue Type: Bug
>            Reporter: Matthew Sharp
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: KNOX-1436.patch
>
>
> The current WebHDFS failoverRequest method makes it a bit difficult to track 
> which host it failed on vs. which it is retrying next. 
> Example:
> {code:java}
> 2018-09-06 07:49:07,245 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:07,246 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:08,278 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:08,279 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,366 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:10,367 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,368 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to 
> failover reached for service: WEBHDFS
> {code}
> In the example above, host1.example.com already failed initially and the 
> message states failing over to a different host with host1.example.com still.
> Suggestion:
> The HaDispatchMessages for failingOverRequest should be moved down below the 
> markFailedURL call, so it is actually returning the next URI it is trying to 
> failover to (not the current it already failed on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to