Xiao Liu created HBASE-30265:
--------------------------------
Summary: Fix flaky
TestProcDispatcher.testRetryLimitOnConnClosedErrors
Key: HBASE-30265
URL: https://issues.apache.org/jira/browse/HBASE-30265
Project: HBase
Issue Type: Bug
Components: test
Reporter: Xiao Liu
Assignee: Xiao Liu
Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7
h3. Symptom
{{TestProcDispatcher.testRetryLimitOnConnClosedErrors}} fails intermittently
(seen https://github.com/apache/hbase/actions/runs/28292988640): it times out
in the first
{{waitFor}} with {{Num of SCPs: 0}}.
h3. Root cause
The test helper {{RSProcDispatcher}} decides when to inject connection errors
from a global static {{sendRequest()}} call count (throw on the 8th-13th /
18th-23rd call). {{remoteDispatch()}} runs for *every* remote procedure in the
cluster (startup, table creation, flush/compact, chores, assignments), so the
number of background dispatches before the test's region moves is
nondeterministic. On a busy run the counter is already past the injection
window by the time the moves happen, so no error is injected, the fail-fast
retry limit is never reached, no {{ServerCrashProcedure}} is scheduled, and the
assertion times out.
h3. Fix
Bind error injection to the open/close-region requests of the test's own table
(driven explicitly by the test) instead of a global counter, so it
deterministically targets the operations under test regardless of background
activity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)