Joe McDonnell created IMPALA-8322:
-------------------------------------

             Summary: S3 tests encounter "timed out waiting for receiver 
fragment instance"
                 Key: IMPALA-8322
                 URL: https://issues.apache.org/jira/browse/IMPALA-8322
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.3.0
            Reporter: Joe McDonnell


This has been seen multiple times when running s3 tests:
{noformat}
query_test/test_join_queries.py:57: in test_basic_joins
    self.run_test_case('QueryTest/joins', new_vector)
common/impala_test_suite.py:472: in run_test_case
    result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:699: in __execute_query
    return impalad_client.execute(query, user=user)
common/impala_connection.py:174: in execute
    return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:183: in execute
    handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:360: in __execute_query
    self.wait_for_finished(handle)
beeswax/impala_beeswax.py:381: in wait_for_finished
    raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
E    Query aborted:Sender 127.0.0.1 timed out waiting for receiver fragment 
instance: 6c40d992bb87af2f:0ce96e5d00000007, dest node: 4{noformat}
This is related to IMPALA-6818. On a bad run, there are various time outs in 
the impalad logs:
{noformat}
I0316 10:47:16.359313 20175 krpc-data-stream-mgr.cc:354] Sender 127.0.0.1 timed 
out waiting for receiver fragment instance: ef4a5dc32a6565bd:a8720b8500000007, 
dest node: 5
I0316 10:47:16.359345 20175 rpcz_store.cc:265] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:40030 (request call id 
14881) took 120182ms. Request Metrics: {}
I0316 10:47:16.359380 20175 krpc-data-stream-mgr.cc:354] Sender 127.0.0.1 timed 
out waiting for receiver fragment instance: d148d83e11a4603d:54dc35f700000004, 
dest node: 3
I0316 10:47:16.359395 20175 rpcz_store.cc:265] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:40030 (request call id 
14880) took 123097ms. Request Metrics: {}
... various messages ...
I0316 10:47:56.364990 20154 kudu-util.h:108] Cancel() RPC failed: Timed out: 
CancelQueryFInstances RPC to 127.0.0.1:27000 timed out after 10.000s (SENT)
... various messages ...
W0316 10:48:15.056421 20150 rpcz_store.cc:251] Call 
impala.ControlService.CancelQueryFInstances from 127.0.0.1:40912 (request call 
id 202) took 48695ms (client timeout 10000).
W0316 10:48:15.056473 20150 rpcz_store.cc:255] Trace:
0316 10:47:26.361265 (+ 0us) impala-service-pool.cc:165] Inserting onto call 
queue
0316 10:47:26.361285 (+ 20us) impala-service-pool.cc:245] Handling call
0316 10:48:15.056398 (+48695113us) inbound_call.cc:162] Queueing success 
response
Metrics: {}
I0316 10:48:15.057087 20139 connection.cc:584] Got response to call id 202 
after client already timed out or cancelled{noformat}
So far, this has only happened on s3. The system load at the time is not higher 
than normal. If anything it is lower than normal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to