[ 
https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235825#comment-17235825
 ] 

Tim Armstrong commented on IMPALA-10339:
----------------------------------------

I saw a crash on a very similar query that fails with repartitioning:

{noformat}
  74: client_identifier (string) = 
"query_test/test_spilling.py::TestSpillingDebugActionDimensions::()::test_spilling[protocol:be
eswax|exec_option:{'mt_dop':0;'debug_action':None;'default_spillable_buffer_size':'256k'}|table_format:parquet/none]",
...
I1117 00:32:24.636593  8768 Frontend.java:1522] 
484209f4cc06bb05:eec7d58d00000000] Analyzing query: select straight_join *
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
where l1.l_orderkey < 100000
order by l1.l_orderkey desc, l1.l_linenumber desc limit 10 db: tpch_parquet
{noformat}

This is this test from 
./testdata/workloads/functional-query/queries/QueryTest/spilling.test:
{noformat}
====
---- QUERY
# Test spilling join with many duplicates in join key. We don't expect this to 
succeed
# with a memory constraint: see IMPALA-4857. Limit size of probe so that query 
doesn't
# bog down executing an exploding join.
# The additional "order by" and "limit" clauses make sure that a successful
# query does not too much data to the client.
set buffer_pool_limit=167m;
select straight_join *
from lineitem l1 join lineitem l2 on l1.l_linenumber = l2.l_linenumber
where l1.l_orderkey < 100000
order by l1.l_orderkey desc, l1.l_linenumber desc limit 10
---- CATCH
Repartitioning did not reduce the size of a spilled partition
====
{noformat}

Backtrace on crash is:
{noformat}
F1117 00:33:09.914166 18122 query-state.cc:877] 
484209f4cc06bb05:eec7d58d00000000] Check failed: is_cancelled_.Load() == 1 (0 
vs. 1) 
*** Check failure stack trace: ***
    @          0x520b32c  google::LogMessage::Fail()
    @          0x520cc1c  google::LogMessage::SendToLog()
    @          0x520ac8a  google::LogMessage::Flush()
    @          0x520e888  google::LogMessageFatal::~LogMessageFatal()
    @          0x228786f  impala::QueryState::MonitorFInstances()
    @          0x2277010  impala::QueryExecMgr::ExecuteQueryHelper()
    @          0x227f924  boost::_mfi::mf1<>::operator()()
    @          0x227f1ed  boost::_bi::list2<>::operator()<>()
    @          0x227e7f4  boost::_bi::bind_t<>::operator()()
    @          0x227dc0e  
boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x21438c3  boost::function0<>::operator()()
    @          0x2721a9b  impala::Thread::SuperviseThread()
    @          0x2729a38  boost::_bi::list5<>::operator()<>()
    @          0x272995c  boost::_bi::bind_t<>::operator()()
    @          0x272991d  boost::detail::thread_data<>::run()
    @          0x3f11e61  thread_proxy
    @     0x7f460b2ffe24  start_thread
    @     0x7f4607d9634c  __clone
Picked up JAVA_TOOL_OPTIONS: 
-agentlib:jdwp=transport=dt_socket,address=30001,server=y,suspend=n  
Wrote minidump to 
/data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/logs/ee_tests/minidumps/impalad/e3c36842-f700-42a5-575dd186-f97a0aed.dmp
{noformat}

> Apparent hang in 
> TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-10339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10339
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build, flaky, hang
>
> Release build with this commit as the tip:
> {noformat}
> commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging)
> Author: wzhou-code <wz...@cloudera.com>
> Date:   Thu Oct 29 22:32:03 2020 -0700
>     IMPALA-10305: Sync Kudu's FIPS compliant changes
>     
> {noformat}
> {noformat}
> Regression
> query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol:
>  beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} 
> | table_format: parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#100 )
> Took 1 hr 59 min.
> add description
> Error Message
> query_test/test_spilling.py:134: in test_spilling_no_debug_action     
> self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
> common/impala_test_suite.py:668: in run_test_case     
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
> common/impala_test_suite.py:485: in __verify_exceptions     (expected_str, 
> actual_str) E   AssertionError: Unexpected exception string. Expected: 
> row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did 
> not reduce the size of a spilled partition.* E   Not found in actual: Timeout 
> >7200s
> Stacktrace
> query_test/test_spilling.py:134: in test_spilling_no_debug_action
>     self.run_test_case('QueryTest/spilling-no-debug-action', vector)
> common/impala_test_suite.py:668: in run_test_case
>     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:485: in __verify_exceptions
>     (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot 
> perform hash join at node with id .*. Repartitioning did not reduce the size 
> of a spilled partition.*
> E   Not found in actual: Timeout >7200s
> Standard Error
> SET 
> client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> -- executing against localhost:21000
> use tpch_parquet;
> -- 2020-11-11 23:12:04,319 INFO     MainThread: Started query 
> c740c1c66d9679a9:6a40f16100000000
> SET 
> client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none];
> SET mt_dop=0;
> SET default_spillable_buffer_size=64k;
> -- 2020-11-11 23:12:04,320 INFO     MainThread: Loading query test file: 
> /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
> -- 2020-11-11 23:12:04,323 INFO     MainThread: Starting new HTTP connection 
> (1): localhost
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:04,377 INFO     MainThread: Started query 
> c044afcf5ae44df9:a2e2e7c600000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a, lineitem b
> where
> a.l_partkey = 1 and
> a.l_orderkey = b.l_orderkey;
> -- 2020-11-11 23:12:04,385 INFO     MainThread: Started query 
> 314c019cd252f322:2411bc7600000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:05,199 INFO     MainThread: Started query 
> 80424e68922c30f9:b2144dff00000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:05,207 INFO     MainThread: Started query 
> 2a4c1f4b26ea52da:4339f3ff00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> lineitem a
> where
> a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10)
> and a.l_partkey < 1000;
> -- 2020-11-11 23:12:05,215 INFO     MainThread: Started query 
> f845afd00a569446:79c5054a00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:07,507 INFO     MainThread: Started query 
> ee4f8a685928e7ef:830d965100000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:07,512 INFO     MainThread: Started query 
> 654a6eced9594931:cc68289a00000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:07,519 INFO     MainThread: Started query 
> 7a41a406b446e082:b5e3bf2f00000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:08,549 INFO     MainThread: Started query 
> 1445a2833895ee1a:d136681000000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:08,554 INFO     MainThread: Started query 
> 4149ef276c643426:a285156400000000
> -- executing against localhost:21000
> select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> -- 2020-11-11 23:12:08,562 INFO     MainThread: Started query 
> 58427e99dadb6ca9:0d184f2700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:09,586 INFO     MainThread: Started query 
> 1d498d1d50b86ed3:616ba0e600000000
> -- executing against localhost:21000
> set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0";
> -- 2020-11-11 23:12:09,592 INFO     MainThread: Started query 
> 4543a32357f9b4cb:e3dbfab800000000
> -- executing against localhost:21000
> with x as (select * from supplier limit 10)
> select straight_join count(*)
> from
> x right anti join lineitem on s_suppkey + 100 = l_suppkey;
> -- 2020-11-11 23:12:09,599 INFO     MainThread: Started query 
> 9e4ba8fd3dbbf2bd:d73094e700000000
> -- executing against localhost:21000
> SET DEBUG_ACTION="";
> -- 2020-11-11 23:12:10,320 INFO     MainThread: Started query 
> 62471da179150bd1:ffa7dd7000000000
> -- executing against localhost:21000
> set mem_limit=75m;
> -- 2020-11-11 23:12:10,326 INFO     MainThread: Started query 
> 8445afe55c819986:0915d3f400000000
> -- executing against localhost:21000
> select l_orderkey, group_concat(repeat(l_comment, 10)) comments
> from lineitem
> group by l_orderkey
> order by comments desc
> limit 5;
> -- 2020-11-11 23:12:10,332 INFO     MainThread: Started query 
> d84ba6a837b98c8d:c7c73fd500000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,638 INFO     MainThread: Started query 
> 58476b9f3b524901:bed8346900000000
> -- executing against localhost:21000
> set topn_bytes_limit=-1;
> -- 2020-11-11 23:12:10,642 INFO     MainThread: Started query 
> 2742e503cfca610e:a4029e4800000000
> -- executing against localhost:21000
> set mem_limit=100m;
> -- 2020-11-11 23:12:10,648 INFO     MainThread: Started query 
> 9a4f67b8c8a95465:67f8143100000000
> -- executing against localhost:21000
> select *
> from lineitem
> order by l_orderkey desc
> limit 6000000;
> -- 2020-11-11 23:12:10,654 INFO     MainThread: Started query 
> 7147f6f62984fdb2:614cab6a00000000
> -- executing against localhost:21000
> SET TOPN_BYTES_LIMIT="536870912";
> -- 2020-11-11 23:12:10,859 INFO     MainThread: Started query 
> 4642e316a7f90110:c8344b0d00000000
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-11 23:12:10,863 INFO     MainThread: Started query 
> 414724e3a0a7f290:d1a0e69000000000
> -- executing against localhost:21000
> set mem_limit=250m;
> -- 2020-11-11 23:12:10,867 INFO     MainThread: Started query 
> 854c0c0d1569c7f5:3635e5ae00000000
> -- executing against localhost:21000
> select straight_join *
> from supplier join /* +broadcast */ lineitem on s_suppkey = l_linenumber
> order by l_tax desc
> limit 5;
> -- 2020-11-11 23:12:10,873 INFO     MainThread: Started query 
> de444e75ef44ee0b:a1614bfe00000000
> ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140237730514688) 
> ~~~~~~~~~~~~~~~~~~~~~
>   File 
> "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
>     reply.run()
>   File 
> "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py",
>  line 213, in run
>     self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
>     msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
>     header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/execnet/gateway_base.py",
>  line 386, in read
>     data = self._read(numbytes-len(buf))
> -- executing against localhost:21000
> SET MEM_LIMIT="0";
> -- 2020-11-12 01:12:03,717 INFO     MainThread: Started query 
> 8d4eac3249e55996:2b2053a700000000
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to