Are the RPC heartbeat checks working right?

When I run java-exec unit test class TestExampleQueries, it seems to run
fine up to some point (e.g., method testWhere, but it varies), but then I
get CONNECTION ERROR and SYSTEM ERROR exceptions for the next 5 or so test
methods, and after that every remaining test method takes 50 seconds (the
JUnit timeout time) (that is, 50 seconds elapses between emission of
subsequent "Running org.apache.drill.TestExampleQueries#..." lines), and
then 41 minutes later it gets hung (for at least 50 minutes).

It seems that maybe the RPC heartbeat checks are timing out even when the
server is still active, maybe as if the heartbeat ping sending is being
blocked by something else going on in the server.

It also seems that there might be a problem in canceling queries and/or
closing connection in reaction to termination from test timeouts (or
failures).  (There have been other reports of client-side CONNECTION
ERROR exceptions resulting from server-side exceptions that wouldn't be
expected to cause either side to close the connection.)



Daniel
--
Daniel Barclay
MapR Technologies

Reply via email to