Dan Hecht has posted comments on this change.

Change subject: IMPALA-3575: Add retry to backend connection request and rpc 
timeout
......................................................................


Patch Set 20:

(24 comments)

Need to look through the last few files still but here's the first set of 
comments.

http://gerrit.cloudera.org:8080/#/c/3343/20//COMMIT_MSG
Commit Message:

Line 12: Impala doesn't set socket send/recv timeout for backend client.
Prior to this change, ...


PS20, Line 24: new RPC call RetryRpcRecv()
This makes it sound like RetryRpcRecv() is a new RPC.  I haven't looked at the 
change yet, but I wouldn't expect it to be.  Clarify.


PS20, Line 28: call
delete (C of RPC stands for call)


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

PS20, Line 102: call
delete


PS20, Line 102: which trigger caller hits RPC timeout.
to trigger an RPC timeout on the caller side.


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.cc
File be/src/runtime/client-cache.cc:

Line 162:   client_map_.erase(client);
is it legal to use the 'client' iterator even though the lock was dropped in 
the mean time? i.e. don't we need to re-find in case the map has changed?


Line 163:   *client_key = NULL;
why doesn't this method need to remove from the per_host_caches_ as well? The 
header comment seems to imply that happens.


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

PS20, Line 96: cache
is this "the per-host cache" that is referenced everywhere else, or something 
different? clarify.


PS20, Line 96: it
what is "it"? the client? or the connection?  does this NULL out *client_key 
like ReleaseClient()?


PS20, Line 224: indicate
indicates

Is retry_is_safe an in parameter or out parameter? Clarify this.
also, is it the case that *return_is_safe is only relevant when returned status 
is not okay?


Line 225:   /// the whole RPC call.
explain this more.  is it because if the send RPC call failed, then the handler 
was never executed to process the RPC. On the other hand, if the send call 
completed, the RPC can't be restarted because the handler is already in 
progress.  Is that right?


Line 260:       if (!status.ok()) {
so this is the only path where we have *retry_is_safe == true and an error is 
returned, right?  If so, why not set *retry_is_safe to false by default (line 
240), and then only set it to true here.  That would make the intent of the 
code clearer because then it's calling out the exceptional failure path where 
we can retry.


Line 265:       } catch (apache::thrift::TException& e) {
why does this path not need the timeout check also?


Line 271:     client_is_unrecoverable_ = false;
why is this logic needed now, whereas it wasn't needed before? or is this 
another bug fix that would have been beneficial even without the retry logic?


PS20, Line 301: the
whether the

but why not just call this 'last_rpc_succeeded_' to make it more self 
explanatory?


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 115:   ImpalaBackendClientCache* client_cache_;
now that we store the RuntimeState, let's get rid of this and just get the 
cache from the runtime state when needed (which isn't very perf critical), to 
make things easier to follow.


PS20, Line 153: wait
retry waiting for the response?


PS20, Line 153: call
delete "call", and would be nice to write RPC


PS20, Line 231: timeout_ms
where is this called with a non-zero timeout_ms?


Line 239:               timeout_ms));
this will throw away the original status message.  Maybe use AddDetail() 
instead on the original status.


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/service/fragment-exec-state.cc
File be/src/service/fragment-exec-state.cc:

Line 125:     if (!retry_is_safe) break;
so i guess this was a bug in the old code?


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/service/fragment-mgr.cc
File be/src/service/fragment-mgr.cc:

Line 38:   FAULT_INJECTION_RPC_DELAY(RPC_EXECPLANFRAGMENT);
is there a reason we decided to put these here rather than in 
impala-internal-services.h handlers, which would then show we've marked up all 
the RPCs?


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

PS20, Line 658: Rewrite
Add details to


PS20, Line 669: Rewrite
same


-- 
To view, visit http://gerrit.cloudera.org:8080/3343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Gerrit-PatchSet: 20
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <[email protected]>
Gerrit-Reviewer: Alan Choi <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Juan Yu <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to