Alexey Serbin created KUDU-3585:
-----------------------------------
Summary: ClientTest.ClearCacheAndConcurrentWorkload fails from
time to time in TSAN builds
Key: KUDU-3585
URL: https://issues.apache.org/jira/browse/KUDU-3585
Project: Kudu
Issue Type: Sub-task
Components: client, test
Affects Versions: 1.17.0, 1.16.0, 1.15.0, 1.14.0
Reporter: Alexey Serbin
The scenario sometimes fails in TSAN builds with output like cited below.
It seems the root cause was RPC queue overflows at kudu-master and
kudu-tserver: both spend much more time on regular requests when built with
TSAN instrumentation, and resetting the client'ss meta-cache too often induces
a lot of GetTableLocations requests, and serving eats a lot of CPU and many
threads are kept busy. Since an internal mini-cluster is used in the scenario
(i.e. all masters and tablet servers are a part of just one process), that
affects kudu-tserver RPC worker threads as well, so many requests accumulate in
the RPC queues.
{noformat}
src/kudu/client/client-test.cc:408: Failure
Expected equality of these values: 0
server->server()->rpc_server()->
service_pool("kudu.tserver.TabletServerService")->
RpcsQueueOverflowMetric()->value()
Which is: 1
src/kudu/client/client-test.cc:584: Failure
Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the
current thread.
Actual: it does.
src/kudu/client/client-test.cc:2466: Failure
Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't
generate new fatal failures in the current thread.
Actual: it does.
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)