Dan Hecht has posted comments on this change. Change subject: IMPALA-3575: Add retry to backend connection request and rpc timeout ......................................................................
Patch Set 20: (5 comments) Do you have a new patchset I can look at? http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.cc File be/src/runtime/client-cache.cc: Line 162: client_map_.erase(client); > http://kera.name/articles/2011/06/iterator-invalidation-rules/ Yeah, the iterator is still valid since no other thread would have removed it. And I missed that you do retake the lock to protect erase(), so agree we are okay. http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS20, Line 224: indicate > if returned status is okay, means both RPC are completed. can or cannot ret Right, but in the case of success, it's not really a "retry". That's just saying whether the RPC is idempotent or not. I'm more talking about whether we guarantee that retry_is_safe is always set regardless of the return value, or if we only set it for errors. The words "the failure" makes it ambigous. How about saying: In case of error, '*retry_is_safe' is set to true if it's safe to retry the RPC because the send never occurred. Otherwise, it's set to false to indicate that the RPC was in progress when it failed, and therefore retrying the RPC is not safe. Line 265: } catch (apache::thrift::TException& e) { > I think the original logic is to give it a second chance but if it fails ag I'm fine with leaving this alone for now but it seems a little weird. PS20, Line 301: the > see my previous patch and Henry's comment I'm fine with either, so if it was already discussed and this is preferred, okay to leave it alone. http://gerrit.cloudera.org:8080/#/c/3343/20/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS20, Line 231: timeout_ms > For now, we like to wait indefinitely to avoid logic change. It'd be preferred to remove the timeout_ms param and this extra logic until we actually start using it, since having it in the code base implies it's being tested, but it's not. Also it's confusing to have dead code, everyone's going to wonder if this is intentional or not (as was my question). So, would you mind moving this into the "future" patch that sets non-zero timeouts? -- To view, visit http://gerrit.cloudera.org:8080/3343 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Gerrit-PatchSet: 20 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Juan Yu <[email protected]> Gerrit-Reviewer: Alan Choi <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
