[ https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235825#comment-17235825 ]
Tim Armstrong commented on IMPALA-10339: ---------------------------------------- I saw a crash on a very similar query that fails with repartitioning: {noformat} 74: client_identifier (string) = "query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:be eswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none]", ... I1117 00:32:24.636593 8768 Frontend.java:1522] 484209f4cc06bb05:eec7d58d00000000] Analyzing query: select straight_join * from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber where l1.l_orderkey < 100000 order by l1.l_orderkey desc, l1.l_linenumber desc limit 10 db: tpch_parquet {noformat} This is this test from ./testdata/workloads/functional-query/queries/QueryTest/spilling.test: {noformat} ==== ---- QUERY # Test spilling join with many duplicates in join key. We don't expect this to succeed # with a memory constraint: see IMPALA-4857. Limit size of probe so that query doesn't # bog down executing an exploding join. # The additional "order by" and "limit" clauses make sure that a successful # query does not too much data to the client. set buffer_pool_limit=167m; select straight_join * from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber where l1.l_orderkey < 100000 order by l1.l_orderkey desc, l1.l_linenumber desc limit 10 ---- CATCH Repartitioning did not reduce the size of a spilled partition ==== {noformat} Backtrace on crash is: {noformat} F1117 00:33:09.914166 18122 query-state.cc:877] 484209f4cc06bb05:eec7d58d00000000] Check failed: is_cancelled_.Load() == 1 (0 vs. 1) *** Check failure stack trace: *** @ 0x520b32c google::LogMessage::Fail() @ 0x520cc1c google::LogMessage::SendToLog() @ 0x520ac8a google::LogMessage::Flush() @ 0x520e888 google::LogMessageFatal::~LogMessageFatal() @ 0x228786f impala::QueryState::MonitorFInstances() @ 0x2277010 impala::QueryExecMgr::ExecuteQueryHelper() @ 0x227f924 boost::_mfi::mf1<>::operator()() @ 0x227f1ed boost::_bi::list2<>::operator()<>() @ 0x227e7f4 boost::_bi::bind_t<>::operator()() @ 0x227dc0e boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x21438c3 boost::function0<>::operator()() @ 0x2721a9b impala::Thread::SuperviseThread() @ 0x2729a38 boost::_bi::list5<>::operator()<>() @ 0x272995c boost::_bi::bind_t<>::operator()() @ 0x272991d boost::detail::thread_data<>::run() @ 0x3f11e61 thread_proxy @ 0x7f460b2ffe24 start_thread @ 0x7f4607d9634c __clone Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30001,server=y,suspend=n Wrote minidump to /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/logs/ee_tests/minidumps/impalad/e3c36842-f700-42a5-575dd186-f97a0aed.dmp {noformat} > Apparent hang in > TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action > ---------------------------------------------------------------------------------- > > Key: IMPALA-10339 > URL: https://issues.apache.org/jira/browse/IMPALA-10339 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 4.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Blocker > Labels: broken-build, flaky, hang > > Release build with this commit as the tip: > {noformat} > commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging) > Author: wzhou-code <wz...@cloudera.com> > Date: Thu Oct 29 22:32:03 2020 -0700 > IMPALA-10305: Sync Kudu's FIPS compliant changes > > {noformat} > {noformat} > Regression > query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol: > beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} > | table_format: parquet/none] (from pytest) > Failing for the past 1 build (Since Failed#100 ) > Took 1 hr 59 min. > add description > Error Message > query_test/test_spilling.py:134: in test_spilling_no_debug_action > self.run_test_case('QueryTest/spilling-no-debug-action', vector) > common/impala_test_suite.py:668: in run_test_case > self.__verify_exceptions(test_section['CATCH'], str(e), use_db) > common/impala_test_suite.py:485: in __verify_exceptions (expected_str, > actual_str) E AssertionError: Unexpected exception string. Expected: > row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did > not reduce the size of a spilled partition.* E Not found in actual: Timeout > >7200s > Stacktrace > query_test/test_spilling.py:134: in test_spilling_no_debug_action > self.run_test_case('QueryTest/spilling-no-debug-action', vector) > common/impala_test_suite.py:668: in run_test_case > self.__verify_exceptions(test_section['CATCH'], str(e), use_db) > common/impala_test_suite.py:485: in __verify_exceptions > (expected_str, actual_str) > E AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot > perform hash join at node with id .*. Repartitioning did not reduce the size > of a spilled partition.* > E Not found in actual: Timeout >7200s > Standard Error > SET > client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none]; > -- executing against localhost:21000 > use tpch_parquet; > -- 2020-11-11 23:12:04,319 INFO MainThread: Started query > c740c1c66d9679a9:6a40f16100000000 > SET > client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none]; > SET mt_dop=0; > SET default_spillable_buffer_size=64k; > -- 2020-11-11 23:12:04,320 INFO MainThread: Loading query test file: > /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test > -- 2020-11-11 23:12:04,323 INFO MainThread: Starting new HTTP connection > (1): localhost > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:04,377 INFO MainThread: Started query > c044afcf5ae44df9:a2e2e7c600000000 > -- executing against localhost:21000 > select straight_join count(*) > from > lineitem a, lineitem b > where > a.l_partkey = 1 and > a.l_orderkey = b.l_orderkey; > -- 2020-11-11 23:12:04,385 INFO MainThread: Started query > 314c019cd252f322:2411bc7600000000 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:05,199 INFO MainThread: Started query > 80424e68922c30f9:b2144dff00000000 > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:05,207 INFO MainThread: Started query > 2a4c1f4b26ea52da:4339f3ff00000000 > -- executing against localhost:21000 > select straight_join count(*) > from > lineitem a > where > a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10) > and a.l_partkey < 1000; > -- 2020-11-11 23:12:05,215 INFO MainThread: Started query > f845afd00a569446:79c5054a00000000 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:07,507 INFO MainThread: Started query > ee4f8a685928e7ef:830d965100000000 > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:07,512 INFO MainThread: Started query > 654a6eced9594931:cc68289a00000000 > -- executing against localhost:21000 > select straight_join count(*) > from > supplier right outer join lineitem on s_suppkey = l_suppkey > where s_acctbal > 0 and s_acctbal < 10; > -- 2020-11-11 23:12:07,519 INFO MainThread: Started query > 7a41a406b446e082:b5e3bf2f00000000 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:08,549 INFO MainThread: Started query > 1445a2833895ee1a:d136681000000000 > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:08,554 INFO MainThread: Started query > 4149ef276c643426:a285156400000000 > -- executing against localhost:21000 > select straight_join count(*) > from > supplier right outer join lineitem on s_suppkey = l_suppkey > where s_acctbal > 0 and s_acctbal < 10; > -- 2020-11-11 23:12:08,562 INFO MainThread: Started query > 58427e99dadb6ca9:0d184f2700000000 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:09,586 INFO MainThread: Started query > 1d498d1d50b86ed3:616ba0e600000000 > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:09,592 INFO MainThread: Started query > 4543a32357f9b4cb:e3dbfab800000000 > -- executing against localhost:21000 > with x as (select * from supplier limit 10) > select straight_join count(*) > from > x right anti join lineitem on s_suppkey + 100 = l_suppkey; > -- 2020-11-11 23:12:09,599 INFO MainThread: Started query > 9e4ba8fd3dbbf2bd:d73094e700000000 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:10,320 INFO MainThread: Started query > 62471da179150bd1:ffa7dd7000000000 > -- executing against localhost:21000 > set mem_limit=75m; > -- 2020-11-11 23:12:10,326 INFO MainThread: Started query > 8445afe55c819986:0915d3f400000000 > -- executing against localhost:21000 > select l_orderkey, group_concat(repeat(l_comment, 10)) comments > from lineitem > group by l_orderkey > order by comments desc > limit 5; > -- 2020-11-11 23:12:10,332 INFO MainThread: Started query > d84ba6a837b98c8d:c7c73fd500000000 > -- executing against localhost:21000 > SET MEM_LIMIT="0"; > -- 2020-11-11 23:12:10,638 INFO MainThread: Started query > 58476b9f3b524901:bed8346900000000 > -- executing against localhost:21000 > set topn_bytes_limit=-1; > -- 2020-11-11 23:12:10,642 INFO MainThread: Started query > 2742e503cfca610e:a4029e4800000000 > -- executing against localhost:21000 > set mem_limit=100m; > -- 2020-11-11 23:12:10,648 INFO MainThread: Started query > 9a4f67b8c8a95465:67f8143100000000 > -- executing against localhost:21000 > select * > from lineitem > order by l_orderkey desc > limit 6000000; > -- 2020-11-11 23:12:10,654 INFO MainThread: Started query > 7147f6f62984fdb2:614cab6a00000000 > -- executing against localhost:21000 > SET TOPN_BYTES_LIMIT="536870912"; > -- 2020-11-11 23:12:10,859 INFO MainThread: Started query > 4642e316a7f90110:c8344b0d00000000 > -- executing against localhost:21000 > SET MEM_LIMIT="0"; > -- 2020-11-11 23:12:10,863 INFO MainThread: Started query > 414724e3a0a7f290:d1a0e69000000000 > -- executing against localhost:21000 > set mem_limit=250m; > -- 2020-11-11 23:12:10,867 INFO MainThread: Started query > 854c0c0d1569c7f5:3635e5ae00000000 > -- executing against localhost:21000 > select straight_join * > from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber > order by l_tax desc > limit 5; > -- 2020-11-11 23:12:10,873 INFO MainThread: Started query > de444e75ef44ee0b:a1614bfe00000000 > ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140237730514688) > ~~~~~~~~~~~~~~~~~~~~~ > File > "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", > line 277, in _perform_spawn > reply.run() > File > "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", > line 213, in run > self._result = func(*args, **kwargs) > File > "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", > line 954, in _thread_receiver > msg = Message.from_io(io) > File > "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", > line 418, in from_io > header = io.read(9) # type 1, channel 4, payload 4 > File > "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py", > line 386, in read > data = self._read(numbytes-len(buf)) > -- executing against localhost:21000 > SET MEM_LIMIT="0"; > -- 2020-11-12 01:12:03,717 INFO MainThread: Started query > 8d4eac3249e55996:2b2053a700000000 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org