Todd Lipcon created KUDU-1466:
---------------------------------
Summary: C++ client errors misreported as GetTableLocations
timeouts
Key: KUDU-1466
URL: https://issues.apache.org/jira/browse/KUDU-1466
Project: Kudu
Issue Type: Bug
Components: client
Affects Versions: 0.8.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
client-test is currently very flaky due to this issue:
- we are injecting some kind of failure on the tablet server (eg DNS resolution
failure)
- when we fail to connect to the TS, we correctly re-trigger a lookup against
the master
- depending how the backoffs and retries line up, we sometimes end up
triggering the lookup retry when the remaining operation budget is very short
(eg <10ms)
-- this GetTabletLocations RPC times out since the master is unable to respond
within the ridiculously short timeout
During the course of retrying some operation, we should probably not replace
the 'last_error' with a master error, so long as we have had at least one
successful master lookup (thus indicating that the master is not the problem)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)