Matthew Sharp created KNOX-1436:
-----------------------------------
Summary: AbstractHdfsHaDispatch failoverRequest - Improve Failover
Logging
Key: KNOX-1436
URL: https://issues.apache.org/jira/browse/KNOX-1436
Project: Apache Knox
Issue Type: Bug
Reporter: Matthew Sharp
The current WebHDFS failoverRequest method makes it a bit difficult to track
which host it failed on vs. which it is retrying next.
Example:
{code:java}
2018-09-06 07:49:07,245 INFO knox.gateway
(AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a
node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:07,246 INFO knox.gateway
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a
different server:
http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:08,278 INFO knox.gateway
(AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a
node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:08,279 INFO knox.gateway
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a
different server:
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:09,291 INFO knox.gateway
(AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a
node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:09,291 INFO knox.gateway
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a
different server:
http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,366 INFO knox.gateway
(AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a
node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:10,367 INFO knox.gateway
(AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a
different server:
http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,368 INFO knox.gateway
(AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to
failover reached for service: WEBHDFS
{code}
In the example above, host1.example.com already failed initially and the
message states failing over to a different host with host1.example.com still.
Suggestion:
The HaDispatchMessages for failingOverRequest should be moved down below the
markFailedURL call, so it is actually returning the next URI it is trying to
failover to (not the current it already failed on).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)