Henry Robinson has posted comments on this change. Change subject: IMPALA-3575: Add retry to backend connection request and rpc timeout ......................................................................
Patch Set 17: (18 comments) Getting close, I like the way you've done the fault injection. http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: PS17, Line 203: if (client_ != NULL) { : if (!has_rpc_error_) { : client_cache_->ReleaseClient(&client_); : } else { : client_cache_->DestroyClient(&client_); : } : } is this path covered by your tests - i.e. the connection will be reopened in the error case? Line 301: /// Indicate the rpc call sent by this connection succeeds or not. If the rpc call fails last rpc call? PS17, Line 304: has_rpc_error_ I would call this something more descriptive, like "cnxn_not_recoverable_". http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS17, Line 233: (timeout_ms == 0) ? 0 : can remove this, and just check timeout_ms in the while condition. PS17, Line 236: deadline timeout_ms http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/service/fragment-exec-state.cc File be/src/service/fragment-exec-state.cc: PS17, Line 116: can_retry call this "retry_is_safe" ? http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/service/fragment-mgr.cc File be/src/service/fragment-mgr.cc: PS17, Line 37: #ifndef NDEBUG : DECLARE_int32(fault_injection_rpc_delay_ms); : DECLARE_int32(fault_injection_rpc_type); : #define FAULT_INJECTION_RPC_DALAY(type) InjectRpcDelay(type, \ : FLAGS_fault_injection_rpc_type, FLAGS_fault_injection_rpc_delay_ms) : #else : #define FAULT_INJECTION_RPC_DALAY(type) : #endif would it be better to put this in fault-injection-util.h? http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS17, Line 186: DALAY DELAY http://gerrit.cloudera.org:8080/#/c/3343/17/be/src/testutil/fault-injection-util.h File be/src/testutil/fault-injection-util.h: PS17, Line 15: _ nit: remove trailing _ http://gerrit.cloudera.org:8080/#/c/3343/17/tests/custom_cluster/test_rpc_timeout.py File tests/custom_cluster/test_rpc_timeout.py: PS17, Line 34: i nit: use _ to show that we don't care about the variable PS17, Line 61: test_runtime_filter_rpc won't pytest try to run this as a test, since it's called test_*? PS17, Line 72: 3000 is this definitely large enough to time out? I don't see if you verify that the query failed as expected - that seems important. PS17, Line 72: - just want to check this works as intended with a missing '-' PS17, Line 73: rpc_timeout1 nit: can you call these "test_execplanfragment_timeout" etc? http://gerrit.cloudera.org:8080/#/c/3343/17/tests/query_test/test_lifecycle.py File tests/query_test/test_lifecycle.py: PS17, Line 36: \ not needed PS17, Line 52: \ remove http://gerrit.cloudera.org:8080/#/c/3343/17/tests/verifiers/metric_verifier.py File tests/verifiers/metric_verifier.py: Line 61: def __init__(self, impalad_service, metric_name): please add a class comment saying what this is for PS17, Line 68: wait_for_metric_reset Is this class really necessary, or could we use MetricVerifier and always pass in init_value to wait_for_metric()? -- To view, visit http://gerrit.cloudera.org:8080/3343 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Gerrit-PatchSet: 17 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Juan Yu <[email protected]> Gerrit-Reviewer: Alan Choi <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Juan Yu <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
