dingwei2019 created HBASE-27959:
-----------------------------------
Summary: change the random scope of random row for
PerformanceEvaluation
Key: HBASE-27959
URL: https://issues.apache.org/jira/browse/HBASE-27959
Project: HBase
Issue Type: Improvement
Components: Performance
Affects Versions: 2.5.5, 2.5.0, 2.3.2
Reporter: dingwei2019
*question description:*
when we use PerformanceEvaluation tool to run randomWrite test, we find out
that when one region happened regiontoobusy, the requests in the ui will
dramatically decreased in a short time from several million to 0.
the mechanism of regiontoobusy is really nice to maximum the throughput of
hbase cluster, it will only influence the current region which happened
regiontoobusy. but when i look into the whole procedure of the random
write(include client and server), i found there may be some problem in
PerformanceEvaluation tool(client) which cause the current question.
*cause of the issue:*
before trying to illustrate the issue, here are some preconditions we need to
know first:
1、one request generated by TestClient thread will contain the data of all
regions(we will accumulate 2M(default client buffer size) request, the request
will include many mutate operations. each mutate operation is generated in
random from a whole table)
2、when one 2M's request doesn't finish, the TestClient won't generate a new
request(due to the PE's mechanism)
let's try to illustrate the cause of the issue:
1、when one region happened regiontoobusy, the 2M's request will not be finished
util the region unblocked. so do the other 2M's request generated by other
TestClients.
2、both the client(PerformanceEvaluation) and server(regionserver) will block
and we will see the request in the ui and cpu util decreased in several seconds
util the unblock of the region.
*probable solution:*
the issue is not due to regionserver(regiontoobusy is a good mechanism from the
regionserver's side), but the client.
if we try to change the scope of random, we will find a way to solve this
problem.
the origin scope is the whole table, if every TestClient generate random row in
it's own scope we will solve this problem, let's take a example:
assume that we have 5 TestClient, each TestClient charge for 1000 requests.
TestClient1 will generate random rows from 0–999;
TestClient2 will generate random rows from 1000–1999;
TestClient3 will generate random rows from 2000–2999;
TestClient4 will generate random rows from 3000--3999;
TestClient5 will generate random rows from 4000–4999;
*some other words to say:*
i raise a question encountered in my work. i hope to have a further discussion
with the experts from the community and other fields in order to seek a better
solution. if my solution is acceptable, i will push a patch to solve the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)