[
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313684#comment-17313684
]
ASF subversion and git services commented on IMPALA-10259:
----------------------------------------------------------
Commit d621c68136b4a47388f579de5e3fe9fd3372bd68 in impala's branch
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d621c68 ]
IMPALA-10259 (part 2): Fixed DCHECK error for backend in terminal state
The previous patch tried to fix the race and make backends avoid to
send status report with fragment instance state as "done" and
overall_status as OK after fragment instance fails. But it does not
work when fragment instance state is updated during generating status
report.
For failed fragment instance, backend should report the instance as
"done" only when overall_statue is reported with error. The final
fragment instance state will be reported in final status report.
This avoid coordinator to ignore the last status report.
Testing:
- Manual tests
I could only reproduce the situation by adding some artificial
delays in the QueryState::ConstructReport() after setting
overall_status for the status report when repeatedly running
test case test_spilling.py::TestSpillingNoDebugActionDimensions
::test_spilling_no_debug_action. Verified that the issue did
not happen after applying this patch.
- Passed exhaustive test.
Change-Id: Ifd9820f9944a78811ee7acfa5870a9418902b17b
Reviewed-on: http://gerrit.cloudera.org:8080/17258
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> -----------------------------------------------------------
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Wenzhe Zhou
> Priority: Blocker
> Labels: broken-build, crash
> Fix For: Impala 4.0
>
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877]
> 924f4ce603ac07bb:a08656e300000000] Check failed: is_cancelled_.Load() == 1 (0
> vs. 1) {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
> ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522]
> 924f4ce603ac07bb:a08656e300000000] Analyzing query: select id, cnt from
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269]
> TClientRequest.queryOptions: TQueryOptions {
> 01: abort_on_error (bool) = true,
> 02: max_errors (i32) = 100,
> 03: disable_codegen (bool) = false,
> 04: batch_size (i32) = 0,
> 05: num_nodes (i32) = 0,
> 06: max_scan_range_length (i64) = 0,
> 07: num_scanner_threads (i32) = 0,
> 11: debug_action (string) = "",
> 12: mem_limit (i64) = 0,
> 15: hbase_caching (i32) = 0,
> 16: hbase_cache_blocks (bool) = false,
> 17: parquet_file_size (i64) = 0,
> 18: explain_level (i32) = 1,
> 19: sync_ddl (bool) = false,
> 24: disable_outermost_topn (bool) = false,
> 26: query_timeout_s (i32) = 0,
> 28: appx_count_distinct (bool) = false,
> 29: disable_unsafe_spills (bool) = false,
> 31: exec_single_node_rows_threshold (i32) = 100,
> 32: optimize_partition_key_scans (bool) = false,
> 33: replica_preference (i32) = 0,
> 34: schedule_random_replica (bool) = false,
> 36: disable_streaming_preaggregations (bool) = false,
> 37: runtime_filter_mode (i32) = 2,
> 38: runtime_bloom_filter_size (i32) = 1048576,
> 39: runtime_filter_wait_time_ms (i32) = 0,
> 40: disable_row_runtime_filtering (bool) = false,
> 41: max_num_runtime_filters (i32) = 10,
> 42: parquet_annotate_strings_utf8 (bool) = false,
> 43: parquet_fallback_schema_resolution (i32) = 0,
> 45: s3_skip_insert_staging (bool) = true,
> 46: runtime_filter_min_size (i32) = 1048576,
> 47: runtime_filter_max_size (i32) = 16777216,
> 48: prefetch_mode (i32) = 1,
> 49: strict_mode (bool) = false,
> 50: scratch_limit (i64) = -1,
> 51: enable_expr_rewrites (bool) = true,
> 52: decimal_v2 (bool) = true,
> 53: parquet_dictionary_filtering (bool) = true,
> 54: parquet_array_resolution (i32) = 0,
> 55: parquet_read_statistics (bool) = true,
> 56: default_join_distribution_mode (i32) = 0,
> 57: disable_codegen_rows_threshold (i32) = 50000,
> 58: default_spillable_buffer_size (i64) = 2097152,
> 59: min_spillable_buffer_size (i64) = 65536,
> 60: max_row_size (i64) = 524288,
> 61: idle_session_timeout (i32) = 0,
> 62: compute_stats_min_sample_size (i64) = 1073741824,
> 63: exec_time_limit_s (i32) = 0,
> 64: shuffle_distinct_exprs (bool) = true,
> 65: max_mem_estimate_for_admission (i64) = 0,
> 66: thread_reservation_limit (i32) = 3000,
> 67: thread_reservation_aggregate_limit (i32) = 0,
> 68: kudu_read_mode (i32) = 0,
> 69: allow_erasure_coded_files (bool) = false,
> 70: timezone (string) = "",
> 71: scan_bytes_limit (i64) = 0,
> 72: cpu_limit_s (i64) = 0,
> 73: topn_bytes_limit (i64) = 536870912,
> 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec)
> built on Fri Oct 16 13:26:18 PDT 2020",
> 75: resource_trace_ratio (double) = 0,
> 76: num_remote_executor_candidates (i32) = 3,
> 77: num_rows_produced_limit (i64) = 0,
> 78: planner_testcase_mode (bool) = false,
> 79: default_file_format (i32) = 0,
> 80: parquet_timestamp_type (i32) = 0,
> 81: parquet_read_page_index (bool) = true,
> 82: parquet_write_page_index (bool) = true,
> 84: disable_hdfs_num_rows_estimate (bool) = false,
> 86: spool_query_results (bool) = false,
> 87: default_transactional_type (i32) = 0,
> 88: statement_expression_limit (i32) = 250000,
> 89: max_statement_length_bytes (i32) = 16777216,
> 90: disable_data_cache (bool) = false,
> 91: max_result_spooling_mem (i64) = 104857600,
> 92: max_spilled_result_spooling_mem (i64) = 1073741824,
> 93: disable_hbase_num_rows_estimate (bool) = false,
> 94: fetch_rows_timeout_ms (i64) = 10000,
> 95: now_string (string) = "",
> 96: parquet_object_store_split_size (i64) = 268435456,
> 97: mem_limit_executors (i64) = 0,
> 98: broadcast_bytes_limit (i64) = 34359738368,
> 99: preagg_bytes_limit (i64) = -1,
> 100: enable_cnf_rewrites (bool) = true,
> 101: max_cnf_exprs (i32) = 0,
> 102: kudu_snapshot_read_timestamp_micros (i64) = 0,
> 103: retry_failed_queries (bool) = false,
> 104: enabled_runtime_filter_types (i32) = 3,
> 105: async_codegen (bool) = false,
> 106: enable_distinct_semi_join_optimization (bool) = true,
> 107: sort_run_bytes_limit (i64) = -1,
> 108: max_fs_writers (i32) = 0,
> 109: refresh_updated_hms_partitions (bool) = false,
> 110: spool_all_results_for_retries (bool) = true,
> 112: use_local_tz_for_unix_timestamp_conversions (bool) = false,
> 113: convert_legacy_hive_parquet_utc_timestamps (bool) = false,
> 114: enable_outer_join_to_inner_transformation (bool) = false,
> } {code}
> Stacktrace:
> {code:java}
> Thread 392 (crashed)
> 0 libc-2.17.so + 0x351f7
> 1 impalad!google::LogMessage::Flush() + 0x1eb
> 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
> 3 impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 877 +
> 0x45]
> 4 impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*)
> [query-exec-mgr.cc : 162 + 0x8]
> 5 impalad!boost::_bi::bind_t<void, boost::_mfi::mf1<void,
> impala::QueryExecMgr, impala::QueryState*>,
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>,
> boost::_bi::value<impala::QueryState*> > >::operator()() [bind.hpp : 1222 +
> 0xe]
> 6 impalad!boost::function0<void>::operator()() const [function_template.hpp
> : 770 + 0x5]
> 7 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0x9]
> 8 impalad!void
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*>
> >::operator()<void (*)(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*),
> boost::_bi::list0>(boost::_bi::type<void>, void
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x12]
> 9 impalad!boost::_bi::bind_t<void, void
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> >
> >::operator()() [bind.hpp : 1222 + 0xe]
> 10 impalad!thread_proxy + 0x72
> 11 libpthread-2.17.so + 0x7e25
> 12 libc-2.17.so + 0xf834d {code}
> This looks like IMPALA-10050 but not sure if it's the same cause.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]