[
https://issues.apache.org/jira/browse/IMPALA-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-11751:
------------------------------------
Summary: Crash in processing partition columns of Avro table with MT_DOP>1
(was: Crash in processing string partition columns of Avro table with MT_DOP>1)
> Crash in processing partition columns of Avro table with MT_DOP>1
> -----------------------------------------------------------------
>
> Key: IMPALA-11751
> URL: https://issues.apache.org/jira/browse/IMPALA-11751
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.1.1
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments: date_str_avro.tar.gz, heap-use-after-free-report1.txt,
> heap-use-after-free-report2.txt
>
>
> We saw a crash in a query that aggregates the string partition column of an
> Avro table with MT_DOP setting to 4. The query is quite simple:
> {code:sql}
> create external table date_str_avro (v int)
> partitioned by (date_str string)
> stored as avro;
> -- Import files attached in this JIRA, repeat the following query.
> -- It will crash in 10 runs.
> set MT_DOP=2;
> select count(*), date_str from date_str_avro group by date_str;
> {code}
> It needs specifit data set to reproduce the crash. Files and steps given
> later.
> Disable codegen (by "set disable_codegen=1") and reproduce the crash. The
> stacktrace is
> {noformat}
> Crash reason: SIGSEGV /SEGV_MAPERR
> Crash address: 0x0
> Process uptime: not available
> Thread 512 (crashed)
> 0 impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const
> [sse-util.h : 227 + 0x2]
> 1 impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*,
> unsigned char const*) const [hash-table.cc : 306 + 0x8]
> 2 impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char
> const*) const [hash-table.cc : 255 + 0x5]
> 3 impalad!void
> impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*,
> int, impala::TPrefetchMode::type, impala::HashTableCtx*)
> [hash-table.inline.h : 39 + 0xe]
> 4 impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool,
> impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*,
> impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
> 5
> impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*,
> impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 +
> 0x2d]
> 6
> impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*,
> impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
> 7 impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*,
> impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
> 8 impalad!impala::FragmentInstanceState::ExecInternal()
> [fragment-instance-state.cc : 446 + 0x3]
> 9 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc
> : 104 + 0xb]
> 10 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*)
> [query-state.cc : 950 + 0x19]
> 11 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763
> + 0x3]
> 12 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >
> >::run() [bind.hpp : 531 + 0x3]
> 13 impalad!thread_proxy + 0x67
> 14 libpthread.so.0 + 0x76ba
> 15 libc.so.6 + 0x1074dd
> {noformat}
> This is reproduced on commit 2733d039a of the master branch.
> Reproducing the bug requires the following conditions:
> * Partitioned Avro table
> * MT_DOP is set to be larger than 1
> * Query needs follow-up processing (e.g. GROUP BY, JOIN, etc.) on the
> partition values or default values of missing fields in the files.
> * num of files(blocks) > num of impalads. So multiple scan fragment
> instances run on one impalad.
> * Some scan node instances finish earlier than others, e.g. when there are
> both small files and large files.
> *Steps to import the attached Avro data files*
> {code:java}
> $ tar zxf date_str_avro.tar.gz
> $ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
> impala-shell> alter table date_str_avro recover partitions;
> {code}
> *RCA*
> This is a bug introduces by IMPALA-9655.
> Each avro file requires at least two scan ranges. The initial range reads the
> file header and initializes the template tuple. The initial scanner then
> issues follow-up scan ranges to read the file content. Mem of the template
> tuple is transferred to the ScanNode. Note that partition values are
> materialized into the template tuple.
> After IMPALA-9655, the ranges of a file could be scheduled to different
> ScanNode instances when MT_DOP > 1. In the following sequence, there is an
> illegal mem access of "heap-use-after-free", which could cause a crash.
> t0:
> Scanner of ScanNode-1 reads header of a large avro file.
> Scanner of ScanNode-2 reads header of a small avro file.
> Varlen memory of the template_tuple transfers to the corresponding ScanNode.
> t1:
> Scanner of ScanNode-1 reads content of the small avro file.
> Scanner of ScanNode-2 reads content of the large avro file.
> Scanner will reuse the template_tuple created by the header scanners [1]. So
> RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
> t2:
> ScanNode-1 finishes first and closes (assuming no more files to read).
> Downstream consumer of ScanNode-2 will crash if accessing the partition
> string values.
> [1]
> [https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]