Xiao Liu created HBASE-30265:
--------------------------------

             Summary: Fix flaky 
TestProcDispatcher.testRetryLimitOnConnClosedErrors
                 Key: HBASE-30265
                 URL: https://issues.apache.org/jira/browse/HBASE-30265
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: Xiao Liu
            Assignee: Xiao Liu
             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.16, 2.6.7


h3. Symptom
{{TestProcDispatcher.testRetryLimitOnConnClosedErrors}} fails intermittently
(seen https://github.com/apache/hbase/actions/runs/28292988640): it times out 
in the first
{{waitFor}} with {{Num of SCPs: 0}}.

h3. Root cause
The test helper {{RSProcDispatcher}} decides when to inject connection errors
from a global static {{sendRequest()}} call count (throw on the 8th-13th /
18th-23rd call). {{remoteDispatch()}} runs for *every* remote procedure in the
cluster (startup, table creation, flush/compact, chores, assignments), so the
number of background dispatches before the test's region moves is
nondeterministic. On a busy run the counter is already past the injection
window by the time the moves happen, so no error is injected, the fail-fast
retry limit is never reached, no {{ServerCrashProcedure}} is scheduled, and the
assertion times out.

h3. Fix
Bind error injection to the open/close-region requests of the test's own table
(driven explicitly by the test) instead of a global counter, so it
deterministically targets the operations under test regardless of background
activity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to