Are the RPC heartbeat checks working right? When I run java-exec unit test class TestExampleQueries, it seems to run fine up to some point (e.g., method testWhere, but it varies), but then I get CONNECTION ERROR and SYSTEM ERROR exceptions for the next 5 or so test methods, and after that every remaining test method takes 50 seconds (the JUnit timeout time) (that is, 50 seconds elapses between emission of subsequent "Running org.apache.drill.TestExampleQueries#..." lines), and then 41 minutes later it gets hung (for at least 50 minutes).
It seems that maybe the RPC heartbeat checks are timing out even when the server is still active, maybe as if the heartbeat ping sending is being blocked by something else going on in the server. It also seems that there might be a problem in canceling queries and/or closing connection in reaction to termination from test timeouts (or failures). (There have been other reports of client-side CONNECTION ERROR exceptions resulting from server-side exceptions that wouldn't be expected to cause either side to close the connection.) Daniel -- Daniel Barclay MapR Technologies
