[
https://issues.apache.org/jira/browse/HBASE-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087897#comment-16087897
]
Enis Soztutar commented on HBASE-18061:
---------------------------------------
Thanks Sudeep for the patch.
I've run the unit tests which pass. I was also be to run simple-client doing
multi-gets against 500K rows with batch sizes 1,10,100,1000,10000 and 100000.
A batch size of 500K results in
{code}
what(): LifoSemMPMCQueue full, can't add item
{code}
but it is fine for now.
Why are we doing try lock here? What happens when we cannot acquire the lock?
{code}
- std::lock_guard<std::mutex> lock(multi_mutex_);
- for (uint64_t num = 0; num < completed_responses.size(); ++num) {
+ std::unique_lock<std::mutex> lock(multi_mutex_, std::try_to_lock);
{code}
We are doing this in multiple places actually, and we never check the results
of whether the lock was acquired or not. Seems something to fix before commit.
remove these from the patch:
{code}
+//#include <gmock/gmock.h>
+//using ::testing::Return;
+//using ::testing::_;
{code}
I was double checking the logic again against the java implementation. It seems
that we are missing the function:
{code}
private List<ThrowableWithExtraContext> removeErrors(Action action) {
synchronized (action2Errors) {
return action2Errors.remove(action);
}
}
{code}
which is called from failOne() etc, and removes the error from the
actions2Errors map. However, logically, I think it is fine to not have this
removal logic. Just making a note here in case we run into it later.
- Mainly a code cleanness issue, but you should move this logic:
{code}
auto itr = search->second->actions_by_region().find(region_loc->region_name());
if (itr != search->second->actions_by_region().end()) {
search->second->AddActionsByRegion(region_loc, action);
} else {
search->second = std::make_shared<ServerRequest>(region_loc);
search->second->AddActionsByRegion(region_loc, action);
}
{code}
to be inside ServerRequest::AddActionsByRegion(). The logic for "add the action
to the list of actions by region by finding the region" belongs in this method,
and should not be exposed outside normally.
I think there is no multi-region test with failure conditions. We can look at
adding that in a follow up issue if needed.
In {{::Send()}}, this logic:
{code}
if (!ExceptionUtil::ShouldRetry(ew)) {
std::vector<std::shared_ptr<Action>> failed_actions;
for (auto &value : action_by_server.second->actions_by_region()) {
for (const auto &failed_action : value.second->actions()) {
failed_actions.push_back(failed_action);
}
}
FailAll(failed_actions, tries);
} else {
OnError(action_by_server.second->actions_by_region(), tries, ew,
action_by_server.first);
}
{code}
can be replaced by a simple OnError() call, no? OnError() already checks
whether exception should retry and calls FailAll() with the same vector of
actions. Can you please double check.
Other than these patch looks pretty good.
> [C++] Fix retry logic in multi-get calls
> ----------------------------------------
>
> Key: HBASE-18061
> URL: https://issues.apache.org/jira/browse/HBASE-18061
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Sudeep Sunthankar
> Fix For: HBASE-14850
>
> Attachments: HBASE-18061.HBASE-14850.v1.patch,
> HBASE-18061.HBASE-14850.v3.patch, HBASE-18061.HBASE-14850.v5.patch,
> HBASE-18061.HBASE-14850.v6.patch, HBASE-18061.HBASE-14850.v7.patch,
> hbase-18061-v8.patch
>
>
> HBASE-17576 adds multi-gets. There are a couple of todos to fix in the retry
> logic, and some unit testing to be done for the multi-gets. We'll do these in
> this issue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)