Quanlong Huang created IMPALA-10621:
---------------------------------------

             Summary: 
TestSpillingDebugActionDimensions.test_spilling_large_rows hanging
                 Key: IMPALA-10621
                 URL: https://issues.apache.org/jira/browse/IMPALA-10621
             Project: IMPALA
          Issue Type: Bug
            Reporter: Quanlong Huang


Saw a possible haning in test_spilling_large_rows in a downstream exhaustive 
release build. The query didn't finish in 12mins:
{code:java}
query_test/test_spilling.py:83: in test_spilling_large_rows
    self.run_test_case('QueryTest/spilling-large-rows', vector, unique_database)
common/impala_test_suite.py:665: in run_test_case
    result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:603: in __exec_in_impala
    result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:923: in __execute_query
    return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
    return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:205: in execute
    result = self.fetch_results(query_string, handle)
beeswax/impala_beeswax.py:452: in fetch_results
    exec_result = self.__fetch_results(query_handle, max_rows)
beeswax/impala_beeswax.py:463: in __fetch_results
    results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, 
fetch_rows))
beeswax/impala_beeswax.py:520: in __do_rpc
    raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
E    MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to 
unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max 
allowed lag is 720000ms) {code}
impalad.INFO indicates it's waiting for the coordinator fragment to finish:
{code:java}
I0325 23:54:16.146754 16521 Frontend.java:1600] 
b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_6df9f183
I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] 
b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
I0325 23:54:16.147109 16521 Frontend.java:1643] 
b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
I0325 23:54:16.148593 20605 admission-controller.cc:1747] 
b94d5cd2894cbe68:68acfd6300000000] Trying to admit 
id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool 
executor_group_name=default per_host_mem_estimate=309.98 MB 
dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 
max_mem=-1.00 B
I0325 23:54:16.148609 20605 admission-controller.cc:1755] 
b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, 
agg_mem_reserved=1.00 GB,  local_host(local_mem_admitted=0, 
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: 
queries=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, 
cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_level_stats: 
num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0325 23:54:16.148628 20605 admission-controller.cc:1215] 
b94d5cd2894cbe68:68acfd6300000000] Admitting query 
id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.148648 20605 impala-server.cc:2150] 
b94d5cd2894cbe68:68acfd6300000000] Registering query locations
I0325 23:54:16.148655 20605 coordinator.cc:150] 
b94d5cd2894cbe68:68acfd6300000000] Exec() 
query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr
I0325 23:54:16.148877 20605 coordinator.cc:474] 
b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149272 27701 control-service.cc:148] 
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
#instances=3
I0325 23:54:16.149561 20605 coordinator.cc:533] 
b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149663 20609 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000002] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
I0325 23:54:16.149741 20611 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000000] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0325 23:54:16.149821 20613 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000004] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
I0325 23:54:16.795727 20609 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000002] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] 
get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 
remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] 
query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: 
impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
I0325 23:54:16.968387 20613 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000004] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query 
id=b94d5cd2894cbe68:68acfd6300000000 execution completed
I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for 
backends to finish, 1 remaining. 
query_id=b94d5cd2894cbe68:68acfd6300000000{code}
After 12mins, it reaches the timeout limit:
{code:java}
I0326 00:06:54.688612 28694 status.cc:82] Query 
b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 
127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms)
    @           0xc93371
    @          0x1398edc
    @          0x138fd14
    @          0x1596d91
    @          0x159820a
    @          0x1e20f11
    @     0x7f57d79ede24
    @     0x7f57d443734c
I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): 
cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] 
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: 
Sending CancelQueryFInstances rpc
I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: 
query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] 
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not 
cancelling because the backend is already done:
I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() 
query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control 
resources for query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] 
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Not 
cancelling because the backend is already done: Cancelled
I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] 
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not 
cancelling because the backend is already done:
I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() 
query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
{code}
This seems similar to IMPALA-10339.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to