[jira] [Comment Edited] (KNOX-1436) AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging

Larry McCay (JIRA) Wed, 26 Sep 2018 13:55:35 -0700


    [ 
https://issues.apache.org/jira/browse/KNOX-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629391#comment-16629391
 ]


Larry McCay edited comment on KNOX-1436 at 9/26/18 8:54 PM:
------------------------------------------------------------

Hi [~MatthewSharp] - I was just reviewing this today and comparing to some of 
the other dispatch related JIRAs and reports that I have been fielding. I 
wanted to make sure that they did contradict one another and I am convinced at 
this point that they don't.

We will be able to get this merged shortly, I think.

In general, we will merge fixes as they come in as long as the review can be 
conclusive which has been hard to do until these other issues have been fully 
understood. Thanks for your patience!

Uggh - this is the second time that I mistook this Jira for the one that 
replaces the retry with failover. That was the one that I meant and we will be 
able to get both of these in. 


was (Author: lmccay):
Hi [~MatthewSharp] - I was just reviewing this today and comparing to some of 
the other dispatch related JIRAs and reports that I have been fielding. I 
wanted to make sure that they did contradict one another and I am convinced at 
this point that they don't.

We will be able to get this merged shortly, I think.

In general, we will merge fixes as they come in as long as the review can be 
conclusive which has been hard to do until these other issues have been fully 
understood. Thanks for your patience!

 

> AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
> -----------------------------------------------------------------
>
>                 Key: KNOX-1436
>                 URL: https://issues.apache.org/jira/browse/KNOX-1436
>             Project: Apache Knox
>          Issue Type: Bug
>            Reporter: Matthew Sharp
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: KNOX-1436.patch
>
>
> The current WebHDFS failoverRequest method makes it a bit difficult to track 
> which host it failed on vs. which it is retrying next. 
> Example:
> {code:java}
> 2018-09-06 07:49:07,245 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:07,246 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:08,278 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:08,279 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a 
> node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:09,291 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,366 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a 
> node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:10,367 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to 
> a different server: 
> http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,368 INFO knox.gateway 
> (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to 
> failover reached for service: WEBHDFS
> {code}
> In the example above, host1.example.com already failed initially and the 
> message states failing over to a different host with host1.example.com still.
> Suggestion:
> The HaDispatchMessages for failingOverRequest should be moved down below the 
> markFailedURL call, so it is actually returning the next URI it is trying to 
> failover to (not the current it already failed on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (KNOX-1436) AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging

Reply via email to