[ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269138#comment-17269138
 ] 

ASF subversion and git services commented on IMPALA-10259:
----------------------------------------------------------

Commit 8ecb61e4bda0b0f712da67f43eea70e5b583f167 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8ecb61e ]

IMPALA-10259: Fixed DCHECK error for backend in terminal state

This issue happened for core ASAN build.
According to log message, one backend sent status report with
instance_exec_status as done for all assigned instances without
error, then it sent last status report with error. The coordinator
treat the backend state as done after it processed the status report
with instance_exec_status as done, but did not apply last status
report with error to the overall backend state.
This caused backend to receive a response with status as OK for the
last status report, hence hit DCHECK error.

This patch fix the race for updating the 'Query State' and updating
the fragment instance state when hitting error during execution of
fragment instance. The backends will not send status report with
fragment instance state as "completed" without error after hitting
error.

Testing:
 - Manual tests
   I could only reproduce the situation by adding some artificial
   delays in the beginning of QueryState::ErrorDuringExecute()
   when repeatedly running test case test_spilling.py::
   TestSpillingDebugActionDimensions::test_spilling_naaj for
   Impala ASAN build.
   Verified that the issue did not happen after applying this
   patch.
 - Passed exhaustive test.

Change-Id: Ic12a80e20ddc11e32349edfec2bd16338c24b841
Reviewed-on: http://gerrit.cloudera.org:8080/16900
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> -----------------------------------------------------------
>
>                 Key: IMPALA-10259
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10259
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Wenzhe Zhou
>            Priority: Blocker
>              Labels: broken-build, crash
>             Fix For: Impala 4.0
>
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e300000000] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e300000000] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 50000,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 250000,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,
>   94: fetch_rows_timeout_ms (i64) = 10000,
>   95: now_string (string) = "",
>   96: parquet_object_store_split_size (i64) = 268435456,
>   97: mem_limit_executors (i64) = 0,
>   98: broadcast_bytes_limit (i64) = 34359738368,
>   99: preagg_bytes_limit (i64) = -1,
>   100: enable_cnf_rewrites (bool) = true,
>   101: max_cnf_exprs (i32) = 0,
>   102: kudu_snapshot_read_timestamp_micros (i64) = 0,
>   103: retry_failed_queries (bool) = false,
>   104: enabled_runtime_filter_types (i32) = 3,
>   105: async_codegen (bool) = false,
>   106: enable_distinct_semi_join_optimization (bool) = true,
>   107: sort_run_bytes_limit (i64) = -1,
>   108: max_fs_writers (i32) = 0,
>   109: refresh_updated_hms_partitions (bool) = false,
>   110: spool_all_results_for_retries (bool) = true,
>   112: use_local_tz_for_unix_timestamp_conversions (bool) = false,
>   113: convert_legacy_hive_parquet_utc_timestamps (bool) = false,
>   114: enable_outer_join_to_inner_transformation (bool) = false,
> } {code}
> Stacktrace:
> {code:java}
> Thread 392 (crashed)
>  0  libc-2.17.so + 0x351f7
>  1  impalad!google::LogMessage::Flush() + 0x1eb
>  2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
>  3  impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 877 + 
> 0x45]
>  4  impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*) 
> [query-exec-mgr.cc : 162 + 0x8]
>  5  impalad!boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
> impala::QueryExecMgr, impala::QueryState*>, 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, 
> boost::_bi::value<impala::QueryState*> > >::operator()() [bind.hpp : 1222 + 
> 0xe]
>  6  impalad!boost::function0<void>::operator()() const [function_template.hpp 
> : 770 + 0x5]
>  7  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0x9]
>  8  impalad!void 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x12]
>  9  impalad!boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() [bind.hpp : 1222 + 0xe]
> 10  impalad!thread_proxy + 0x72
> 11  libpthread-2.17.so + 0x7e25
> 12  libc-2.17.so + 0xf834d {code}
> This looks like IMPALA-10050 but not sure if it's the same cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to