Juan Yu has posted comments on this change.

Change subject: IMPALA-3575: Add retry to backend connection request and rpc 
timeout
......................................................................


Patch Set 21:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

PS21, Line 227: is
> delete "is"
Done


Line 304:   TNetworkAddress address_;
> can't we get this from client_->address()?
"client_" is not an instance of ThriftClientImpl


http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS21, Line 134: 300000
> Is there a short comment you could write to justify how this was chosen (5 
Done


PS21, Line 134: The time after "
              :     "which a backend client send/recv RPC call will timeout.
> The send/recv connection timeout in milliseconds for a backend client RPC.
This is the underlying TSocket send/recv call timeout, not connection timeout.


PS21, Line 138:  
> same
Done


PS21, Line 157: 0
> why is this 0? (wait_ms)
This is for retry opening connection, usually each retry will take several 
seconds. waiting even longer won't help much.


PS21, Line 162: 100
> how was this chosen?
I'll set this to 0.


Line 223:             "", !FLAGS_ssl_client_ca_certificate.empty())),
> not your change, but it's really unfortunate we duplicate this code. let's 
I'll add a Todo here.


http://gerrit.cloudera.org:8080/#/c/3343/21/be/src/testutil/fault-injection-util.h
File be/src/testutil/fault-injection-util.h:

PS21, Line 36: RPC_RANDOM
> comment that this must be last
Done


PS21, Line 39: call
> delete
Done


PS21, Line 40: timeout
> this is the recv connection timeout, correct? if so, how about saying "recv
Done


PS21, Line 41: RpcCallType my_type, int32_t rpc_type, int32_t delay_ms
> document these.
Done


PS21, Line 44: rpc_type == RPC_NULL
> what is specifying RPC_NULL used for?
Just for easy testing, you can easily enable disable the fault injection by 
changing the value, no need to add/remove the startup flag. In the future, we 
could change this value dynamically to do more testing.


Line 50:       FLAGS_fault_injection_rpc_type, 
FLAGS_fault_injection_rpc_delay_ms)
> why pass these as arguments rather than just having InjectRpcDelay() read t
Similar reason as above, we could test with dynamic values without the need to 
restart cluster.


http://gerrit.cloudera.org:8080/#/c/3343/21/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

Line 119:     self.execute_query_verify_metrics(self.TEST_QUERY, 10)
> how long do all these tests take to execute?  let's run them only in exhaus
About 5 minutes. ok, I'll change to only execute in exhaustive mode.


-- 
To view, visit http://gerrit.cloudera.org:8080/3343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Gerrit-PatchSet: 21
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <[email protected]>
Gerrit-Reviewer: Alan Choi <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Juan Yu <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to