Quanlong Huang created IMPALA-10621:
---------------------------------------
Summary:
TestSpillingDebugActionDimensions.test_spilling_large_rows hanging
Key: IMPALA-10621
URL: https://issues.apache.org/jira/browse/IMPALA-10621
Project: IMPALA
Issue Type: Bug
Reporter: Quanlong Huang
Saw a possible haning in test_spilling_large_rows in a downstream exhaustive
release build. The query didn't finish in 12mins:
{code:java}
query_test/test_spilling.py:83: in test_spilling_large_rows
self.run_test_case('QueryTest/spilling-large-rows', vector, unique_database)
common/impala_test_suite.py:665: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:603: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:923: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:205: in execute
result = self.fetch_results(query_string, handle)
beeswax/impala_beeswax.py:452: in fetch_results
exec_result = self.__fetch_results(query_handle, max_rows)
beeswax/impala_beeswax.py:463: in __fetch_results
results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False,
fetch_rows))
beeswax/impala_beeswax.py:520: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E ImpalaBeeswaxException: ImpalaBeeswaxException:
E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
E MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to
unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max
allowed lag is 720000ms) {code}
impalad.INFO indicates it's waiting for the coordinator fragment to finish:
{code:java}
I0325 23:54:16.146754 16521 Frontend.java:1600]
b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_6df9f183
I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110]
b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
I0325 23:54:16.147109 16521 Frontend.java:1643]
b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
I0325 23:54:16.148593 20605 admission-controller.cc:1747]
b94d5cd2894cbe68:68acfd6300000000] Trying to admit
id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool
executor_group_name=default per_host_mem_estimate=309.98 MB
dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200
max_mem=-1.00 B
I0325 23:54:16.148609 20605 admission-controller.cc:1755]
b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0,
agg_mem_reserved=1.00 GB, local_host(local_mem_admitted=0,
num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats:
queries=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000,
cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_level_stats:
num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0325 23:54:16.148628 20605 admission-controller.cc:1215]
b94d5cd2894cbe68:68acfd6300000000] Admitting query
id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.148648 20605 impala-server.cc:2150]
b94d5cd2894cbe68:68acfd6300000000] Registering query locations
I0325 23:54:16.148655 20605 coordinator.cc:150]
b94d5cd2894cbe68:68acfd6300000000] Exec()
query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr
I0325 23:54:16.148877 20605 coordinator.cc:474]
b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149272 27701 control-service.cc:148]
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances():
query_id=b94d5cd2894cbe68:68acfd6300000000
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
#instances=3
I0325 23:54:16.149561 20605 coordinator.cc:533]
b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149663 20609 query-state.cc:948]
b94d5cd2894cbe68:68acfd6300000002] Executing instance.
instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
I0325 23:54:16.149741 20611 query-state.cc:948]
b94d5cd2894cbe68:68acfd6300000000] Executing instance.
instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0325 23:54:16.149821 20613 query-state.cc:948]
b94d5cd2894cbe68:68acfd6300000004] Executing instance.
instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
I0325 23:54:16.795727 20609 query-state.cc:957]
b94d5cd2894cbe68:68acfd6300000002] Instance completed.
instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308]
b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr():
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215]
get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed:
host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001
remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371]
query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend:
impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
I0325 23:54:16.968387 20613 query-state.cc:957]
b94d5cd2894cbe68:68acfd6300000004] Instance completed.
instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308]
b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr():
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query
id=b94d5cd2894cbe68:68acfd6300000000 execution completed
I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for
backends to finish, 1 remaining.
query_id=b94d5cd2894cbe68:68acfd6300000000{code}
After 12mins, it reaches the timeout limit:
{code:java}
I0326 00:06:54.688612 28694 status.cc:82] Query
b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend:
127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms)
@ 0xc93371
@ 0x1398edc
@ 0x138fd14
@ 0x1596d91
@ 0x159820a
@ 0x1e20f11
@ 0x7f57d79ede24
@ 0x7f57d443734c
I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool():
cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974]
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000:
Sending CancelQueryFInstances rpc
I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances():
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState:
query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
I0326 00:06:54.688993 27708 query-state.cc:974] Cancel:
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974]
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not
cancelling because the backend is already done:
I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends()
query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control
resources for query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery():
query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974]
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Not
cancelling because the backend is already done: Cancelled
I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974]
query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not
cancelling because the backend is already done:
I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends()
query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
{code}
This seems similar to IMPALA-10339.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]