Juan Yu has posted comments on this change. Change subject: IMPALA-3575: Add retry to backend connection request and rpc timeout ......................................................................
Patch Set 21: (15 comments) http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS21, Line 227: is > delete "is" Done Line 304: TNetworkAddress address_; > can't we get this from client_->address()? "client_" is not an instance of ThriftClientImpl http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS21, Line 134: 300000 > Is there a short comment you could write to justify how this was chosen (5 Done PS21, Line 134: The time after " : "which a backend client send/recv RPC call will timeout. > The send/recv connection timeout in milliseconds for a backend client RPC. This is the underlying TSocket send/recv call timeout, not connection timeout. PS21, Line 138: > same Done PS21, Line 157: 0 > why is this 0? (wait_ms) This is for retry opening connection, usually each retry will take several seconds. waiting even longer won't help much. PS21, Line 162: 100 > how was this chosen? I'll set this to 0. Line 223: "", !FLAGS_ssl_client_ca_certificate.empty())), > not your change, but it's really unfortunate we duplicate this code. let's I'll add a Todo here. http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/testutil/fault-injection-util.h File be/src/testutil/fault-injection-util.h: PS21, Line 36: RPC_RANDOM > comment that this must be last Done PS21, Line 39: call > delete Done PS21, Line 40: timeout > this is the recv connection timeout, correct? if so, how about saying "recv Done PS21, Line 41: RpcCallType my_type, int32_t rpc_type, int32_t delay_ms > document these. Done PS21, Line 44: rpc_type == RPC_NULL > what is specifying RPC_NULL used for? Just for easy testing, you can easily enable disable the fault injection by changing the value, no need to add/remove the startup flag. In the future, we could change this value dynamically to do more testing. Line 50: FLAGS_fault_injection_rpc_type, FLAGS_fault_injection_rpc_delay_ms) > why pass these as arguments rather than just having InjectRpcDelay() read t Similar reason as above, we could test with dynamic values without the need to restart cluster. http://gerrit.cloudera.org:8080/#/c/3343/21/tests/custom_cluster/test_rpc_timeout.py File tests/custom_cluster/test_rpc_timeout.py: Line 119: self.execute_query_verify_metrics(self.TEST_QUERY, 10) > how long do all these tests take to execute? let's run them only in exhaus About 5 minutes. ok, I'll change to only execute in exhaustive mode. -- To view, visit http://gerrit.cloudera.org:8080/3343 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Gerrit-PatchSet: 21 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Juan Yu <[email protected]> Gerrit-Reviewer: Alan Choi <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
