Dan Hecht has posted comments on this change.

Change subject: IMPALA-3575: Add retry to backend connection request and rpc 
timeout
......................................................................


Patch Set 20:

(5 comments)

Do you have a new patchset I can look at?

http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.cc
File be/src/runtime/client-cache.cc:

Line 162:   client_map_.erase(client);
> http://kera.name/articles/2011/06/iterator-invalidation-rules/
Yeah, the iterator is still valid since no other thread would have removed it.  
And I missed that you do retake the lock to protect erase(), so agree we are 
okay.


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

PS20, Line 224: indicate
> if returned status is okay, means both RPC are completed. can or cannot ret
Right, but in the case of success, it's not really a "retry". That's just 
saying whether the RPC is idempotent or not.

I'm more talking about whether we guarantee that retry_is_safe is always set 
regardless of the return value, or if we only set it for errors.  The words 
"the failure" makes it ambigous.  How about saying:

In case of error, '*retry_is_safe' is set to true if it's safe to retry the RPC 
because the send never occurred.  Otherwise, it's set to false to indicate that 
the RPC was in progress when it failed, and therefore retrying the RPC is not 
safe.


Line 265:       } catch (apache::thrift::TException& e) {
> I think the original logic is to give it a second chance but if it fails ag
I'm fine with leaving this alone for now but it seems a little weird.


PS20, Line 301: the
> see my previous patch and Henry's comment
I'm fine with either, so if it was already discussed and this is preferred, 
okay to leave it alone.


http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS20, Line 231: timeout_ms
> For now, we like to wait indefinitely to avoid logic change.
It'd be preferred to remove the timeout_ms param and this extra logic until we 
actually start using it, since having it in the code base implies it's being 
tested, but it's not.  Also it's confusing to have dead code, everyone's going 
to wonder if this is intentional or not (as was my question).  So, would you 
mind moving this into the "future" patch that sets non-zero timeouts?


-- 
To view, visit http://gerrit.cloudera.org:8080/3343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Gerrit-PatchSet: 20
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <[email protected]>
Gerrit-Reviewer: Alan Choi <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Juan Yu <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to