[ 
https://issues.apache.org/jira/browse/IMPALA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186680#comment-17186680
 ] 

ASF subversion and git services commented on IMPALA-10050:
----------------------------------------------------------

Commit 3733c4cc2cfb78d7f13463fb1ee9e1c4560d4a3d in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3733c4c ]

IMPALA-10050: Fixed DCHECK error for backend in terminal state.

Recent patch for IMPALA-6788 makes coordinator to cancel inflight
query fragment instances when it receives failure report from one
backend. It's possible the BackendState::Cancel() is called for
one fragment instance before the first execution status report
from its backend is received and processed by the coordinator.
Since the status of BackendState is set as Cancelled after Cancel()
is called, the execution of the fragment instance is treated as
Done in such case so that the status report will NOT be processed.
Hence the backend receives response OK from coordinator even it
sent a report with execution error. This make backend hit DCHECK
error if backend in the terminal state with error.
This patch fixs the issue by making coordinator send CANCELLED
status in the response of status report if the backend status is not
ok and the execution status report is not applied.

Testing:
 - The issue could be reproduced by running test_failpoints for about
   20 iterations. Verified the fixing by running test_failpoints over
   200 iterations without DCHECK failure.
 - Passed TestProcessFailures::test_kill_coordinator.
 - Psssed TestRPCException::test_state_report_error.
 - Passed exhaustive tests.

Change-Id: Iba6a72f98c0f9299c22c58830ec5a643335b966a
Reviewed-on: http://gerrit.cloudera.org:8080/16303
Reviewed-by: Thomas Tauber-Marshall <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> DCHECK was hit possibly while executing TestFailpoints::test_failpoints
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-10050
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10050
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Attila Jeges
>            Assignee: Wenzhe Zhou
>            Priority: Blocker
>              Labels: broken-build, crash, flaky
>             Fix For: Impala 4.0
>
>
> A DCHECK was hit during  ASAN core e2e tests. Time-frame suggests that it 
> happened while executing TestFailpoints::test_failpoints e2e test.
> {code}
> 10:56:38  TestFailpoints.test_failpoints[protocol: beeswax | table_format: 
> avro/snap/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: MEM_LIMIT_EXCEEDED | query: select 1 from 
> alltypessmall a join alltypessmall b on a.id = b.id] 
> 10:56:38 failure/test_failpoints.py:128: in test_failpoints
> 10:56:38     self.execute_query(query, vector.get_value('exec_option'))
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:811:
>  in wrapper
> 10:56:38     return function(*args, **kwargs)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:843:
>  in execute_query
> 10:56:38     return self.__execute_query(self.client, query, query_options)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_test_suite.py:909:
>  in __execute_query
> 10:56:38     return impalad_client.execute(query, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> 10:56:38     return self.__beeswax_client.execute(sql_stmt, user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> 10:56:38     handle = self.__execute_query(query_string.strip(), user=user)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> 10:56:38     self.wait_for_finished(handle)
> 10:56:38 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> 10:56:38     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 10:56:38 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 10:56:38 E    Query aborted:RPC from 127.0.0.1:27000 to 127.0.0.1:27002 failed
> 10:56:38 E   TransmitData() to 127.0.0.1:27002 failed: Network error: Client 
> connection negotiation failed: client connection to 127.0.0.1:27002: connect: 
> Connection refused (error 111)
> {code}
> Impalad log:
> {code}
> Log file created at: 2020/08/05 01:52:56
> Running on machine: 
> impala-ec2-centos74-r5-4xlarge-ondemand-017c.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> F0805 01:52:56.979769 17313 query-state.cc:803] 
> 3941a3d92a71e242:15c963f300000000] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1) 
> {code}
> Stack trace
> {code}
> Thread 368 (crashed)
>  0  libc-2.17.so + 0x351f7
>     rax = 0x0000000000000000   rdx = 0x0000000000000006
>     rcx = 0xffffffffffffffff   rbx = 0x0000000000000004
>     rsi = 0x00000000000043a1   rdi = 0x00000000000037e4
>     rbp = 0x00007efcd4c53080   rsp = 0x00007efcd4c52d08
>      r8 = 0x0000000000000000    r9 = 0x00007efcd4c52b80
>     r10 = 0x0000000000000008   r11 = 0x0000000000000206
>     r12 = 0x00000000093de7c0   r13 = 0x0000000000000086
>     r14 = 0x00000000093de7c4   r15 = 0x00000000093d6de0
>     rip = 0x00007f05c9d231f7
>     Found by: given as instruction pointer in context
>  1  impalad!google::LogMessage::Flush() + 0x1eb
>     rbp = 0x00007efcd4c53250   rsp = 0x00007efcd4c53090
>     rip = 0x0000000005727e3b
>     Found by: previous frame's frame pointer
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
>     rbx = 0x00007efcd4c532a0   rbp = 0x00007efcd4c53310
>     rsp = 0x00007efcd4c53130   r12 = 0x00000fe01a982628
>     r13 = 0x000061d000da0a6c   r14 = 0x00007efcd4c53250
>     r15 = 0x00007efcd4c53270   rip = 0x000000000572ba39
>     Found by: call frame info
>  3  impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 803 + 
> 0x45]
>     rbx = 0x00007efcd4c532a0   rbp = 0x00007efcd4c53310
>     rsp = 0x00007efcd4c53140   r12 = 0x00000fe01a982628
>     r13 = 0x000061d000da0a6c   r14 = 0x00007efcd4c53250
>     r15 = 0x00007efcd4c53270   rip = 0x00000000024c1713
>     Found by: call frame info
>  4  impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*) 
> [query-exec-mgr.cc : 161 + 0x8]
>     rbx = 0x000061d000da0280   rbp = 0x00007efcd4c53330
>     rsp = 0x00007efcd4c53320   r12 = 0x00007efcd4c53360
>     r13 = 0x00007efcd4c53340   r14 = 0x00000fdf9a98a668
>     r15 = 0x0000000000000000   rip = 0x00000000024a767d
>     Found by: call frame info
>  5  impalad!boost::_bi::bind_t&lt;void, boost::_mfi::mf1&lt;void, 
> impala::QueryExecMgr, impala::QueryState*&gt;, 
> boost::_bi::list2&lt;boost::_bi::value&lt;impala::QueryExecMgr*&gt;, 
> boost::_bi::value&lt;impala::QueryState*&gt; &gt; &gt;::operator()() 
> [bind.hpp : 1222 + 0xe]
>     rbx = 0x00007efcd4c53380   rbp = 0x00007efcd4c533f0
>     rsp = 0x00007efcd4c53340   r12 = 0x00007efcd4c53360
>     r13 = 0x00007efcd4c53340   r14 = 0x00000fdf9a98a668
>     r15 = 0x0000000000000000   rip = 0x00000000024b0f5c
>     Found by: call frame info
>  6  impalad!boost::function0&lt;void&gt;::operator()() const 
> [function_template.hpp : 770 + 0x5]
>     rbx = 0x00007efcd4c53440   rbp = 0x00007efcd4c534b0
>     rsp = 0x00007efcd4c53400   r12 = 0x00000fdf9a98a680
>     r13 = 0x00007efcd4c53400   r14 = 0x0000000000000000
>     r15 = 0x00007efcd4c53a88   rip = 0x000000000221c277
>     Found by: call frame info
>  7  
> impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, 
> std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, boost::function&lt;void ()&gt;, 
> impala::ThreadDebugInfo const*, impala::Promise&lt;long, 
> (impala::PromiseMode)0&gt;*) [thread.cc : 360 + 0x9]
>     rbx = 0x00007efcd4c539a0   rbp = 0x00007efcd4c53a50
>     rsp = 0x00007efcd4c534c0   r12 = 0x00007efcd4c534c0
>     r13 = 0x00007efcd4c536c0   r14 = 0x00007efcd4c535f0
>     r15 = 0x00000fe01a982698   rip = 0x0000000002c005da
>     Found by: call frame info
>  8  impalad!void 
> boost::_bi::list5&lt;boost::_bi::value&lt;std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, 
> boost::_bi::value&lt;std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, 
> boost::_bi::value&lt;boost::function&lt;void ()&gt; &gt;, 
> boost::_bi::value&lt;impala::ThreadDebugInfo*&gt;, 
> boost::_bi::value&lt;impala::Promise&lt;long, (impala::PromiseMode)0&gt;*&gt; 
> &gt;::operator()&lt;void (*)(std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, 
> std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, boost::function&lt;void ()&gt;, 
> impala::ThreadDebugInfo const*, impala::Promise&lt;long, 
> (impala::PromiseMode)0&gt;*), 
> boost::_bi::list0&gt;(boost::_bi::type&lt;void&gt;, void 
> (*&amp;)(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, 
> std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, boost::function&lt;void ()&gt;, 
> impala::ThreadDebugInfo const*, impala::Promise&lt;long, 
> (impala::PromiseMode)0&gt;*), boost::_bi::list0&amp;, int) [bind.hpp : 531 + 
> 0x12]
>     rbx = 0x00007efcd4c53ac0   rbp = 0x00007efcd4c53b50
>     rsp = 0x00007efcd4c53a60   r12 = 0x00006160006a73a8
>     r13 = 0x00007efcd4c53b80   r14 = 0x00007efcd4c53a60
>     r15 = 0x00007efd732796c0   rip = 0x0000000002c0ba19
>     Found by: call frame info
>  9  impalad!boost::_bi::bind_t&lt;void, void 
> (*)(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, 
> std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, 
> std::allocator&lt;char&gt; &gt; const&amp;, boost::function&lt;void ()&gt;, 
> impala::ThreadDebugInfo const*, impala::Promise&lt;long, 
> (impala::PromiseMode)0&gt;*), 
> boost::_bi::list5&lt;boost::_bi::value&lt;std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, 
> boost::_bi::value&lt;std::__cxx11::basic_string&lt;char, 
> std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, 
> boost::_bi::value&lt;boost::function&lt;void ()&gt; &gt;, 
> boost::_bi::value&lt;impala::ThreadDebugInfo*&gt;, 
> boost::_bi::value&lt;impala::Promise&lt;long, (impala::PromiseMode)0&gt;*&gt; 
> &gt; &gt;::operator()() [bind.hpp : 1222 + 0xe]
>     rbx = 0x00007efcd4c53ba0   rbp = 0x00007efcd4c53c10
>     rsp = 0x00007efcd4c53b60   r12 = 0x00007efcd4c53b80
>     r13 = 0x00007efcd4c53b60   r14 = 0x00000fdf9a98a76c
>     r15 = 0x0000000000000000   rip = 0x0000000002c0b86c
>     Found by: call frame info
> 10  impalad!thread_proxy + 0x72
>     rbx = 0x00006160006a7180   rbp = 0x0000603000f98230
>     rsp = 0x00007efcd4c53c20   r12 = 0x0000000000000000
>     r13 = 0x00007efcd4c549c0   r14 = 0x00007efcd4c54700
>     r15 = 0x000000000000002b   rip = 0x000000000440b772
>     Found by: call frame info
> 11  libpthread-2.17.so + 0x7e25
>     rbx = 0x0000000000000000   rbp = 0x0000000000000000
>     rsp = 0x00007efcd4c53c60   r12 = 0x0000000000000000
>     r13 = 0x00007efcd4c549c0   r14 = 0x00007efcd4c54700
>     r15 = 0x000000000000002b   rip = 0x00007f05cd34fe25
>     Found by: call frame info
> 12  libc-2.17.so + 0xf834d
>     rsp = 0x00007efcd4c53d00
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to