[ 
https://issues.apache.org/jira/browse/IMPALA-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705725#comment-17705725
 ] 

ASF subversion and git services commented on IMPALA-11751:
----------------------------------------------------------

Commit f3f0293df4c67bea7fdc136469d6835729ddee66 in impala's branch 
refs/heads/branch-4.1.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f3f0293df ]

IMPALA-11751: Template tuple of Avro header should be transferred to 
ScanRangeSharedState

Sequence container based file formats (SequenceFile, RCFile, Avro) have
a file header in each file that describes the metadata of the file, e.g.
codec, default values, etc. The header should be decoded before reading
the file content. The initial scanners will read the header and then
issue follow-up scan ranges for the file content. The decoded header
will be referenced by follow-up scanners.

Since IMPALA-9655, when MT_DOP > 1, the issued scan ranges could be
scheduled to other scan node instances. So the header resource should
live until all scan node instances close. Header objects are owned by
the object pool of the RuntimeState, which meets the requirement.

AvroFileHeader is special than other headers in that it references a
template tuple which contains the partition values and default values
for missing fields. The template tuple is initially owned by the header
scanner, then transferred to the scan node before the scanner closes.
However, when the scan node instance closes, the template tuple is
freed. Scanners of other scan node instances might still depend on it.
This could cause wrong results or crash the impalad.

When partition columns are used in the query, or when the underlying
avro files have missing fields and the table schema has default values
for them, the AvroFileHeader will have a non-null template tuple, which
could hit this bug when MT_DOP>1.

This patch fixes the bug by transferring the template tuple to
ScanRangeSharedState directly. The scan_node_pool of HdfsScanNodeBase is
also removed since it's only used to hold the template tuple (and
related buffers) of the avro header. Also no need to override
TransferToScanNodePool in HdfsScanNode since the original purpose is to
protect the pool by a lock, and now the method in ScanRangeSharedState
already has a lock.

Tests
 - Add missing test coverage for compute stats on avro tables. Note that
   MT_DOP=4 is set by default for compute stats.
 - Add the MT_DOP dimension for TestScannersAllTableFormats. Also add
   some queries that can reveal the bug in scanners.test. The ASAN build
   can easily crash by heap-use-after-free error without this fix.
 - Ran exhaustive tests.

Backport Notes:
 - Trivial conflicts in hdfs-scan-node-base.h and hdfs-scan-node-base.cc
   due to missing iceberg_partition_filtering_pool_ and
   HasVirtualColumnInTemplateTuple().

Change-Id: Iafa43fce7c2ffdc867004d11e5873327c3d8cb42
Reviewed-on: http://gerrit.cloudera.org:8080/19289
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Crash in processing partition columns of Avro table with MT_DOP>1
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11751
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11751
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>         Attachments: date_str_avro.tar.gz, heap-use-after-free-report1.txt, 
> heap-use-after-free-report2.txt
>
>
> We saw a crash in a query that aggregates the string partition column of an 
> Avro table with MT_DOP setting to 4. The query is quite simple:
> {code:sql}
> create external table date_str_avro (v int)
>   partitioned by (date_str string)
>   stored as avro;
> -- Import files attached in this JIRA, repeat the following query.
> -- It will crash in 10 runs.
> set MT_DOP=2;
> select count(*), date_str from date_str_avro group by date_str;
> {code}
> It needs specifit data set to reproduce the crash. Files and steps given 
> later.
> Disable codegen (by "set disable_codegen=1") and reproduce the crash. The 
> stacktrace is
> {noformat}
> Crash reason:  SIGSEGV /SEGV_MAPERR
> Crash address: 0x0
> Process uptime: not available
> Thread 512 (crashed)
>  0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const 
> [sse-util.h : 227 + 0x2]
>  1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, 
> unsigned char const*) const [hash-table.cc : 306 + 0x8]
>  2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char 
> const*) const [hash-table.cc : 255 + 0x5]
>  3  impalad!void 
> impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*,
>  int, impala::TPrefetchMode::type, impala::HashTableCtx*) 
> [hash-table.inline.h : 39 + 0xe]
>  4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, 
> impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, 
> impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
>  5  
> impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, 
> impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 
> 0x2d]
>  6  
> impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*,
>  impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
>  7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, 
> impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
>  8  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 446 + 0x3]
>  9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 104 + 0xb]
> 10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 950 + 0x19]
> 11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 
> + 0x3]
> 12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() [bind.hpp : 531 + 0x3]
> 13  impalad!thread_proxy + 0x67
> 14  libpthread.so.0 + 0x76ba
> 15  libc.so.6 + 0x1074dd
> {noformat}
> This is reproduced on commit 2733d039a of the master branch.
> Reproducing the bug requires the following conditions:
>  * Partitioned Avro table
>  * MT_DOP is set to be larger than 1
>  * Query needs follow-up processing (e.g. GROUP BY, JOIN, etc.) on the 
> partition values or default values of missing fields in the files.
>  * num of files(blocks) > num of impalads. So multiple scan fragment 
> instances run on one impalad.
>  * Some scan node instances finish earlier than others, e.g. when there are 
> both small files and large files.
> *Steps to import the attached Avro data files*
> {code:java}
> $ tar zxf date_str_avro.tar.gz
> $ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
> impala-shell> alter table date_str_avro recover partitions;
> {code}
> *RCA*
> This is a bug introduces by IMPALA-9655.
> Each avro file requires at least two scan ranges. The initial range reads the 
> file header and initializes the template tuple. The initial scanner then 
> issues follow-up scan ranges to read the file content. Mem of the template 
> tuple is transferred to the ScanNode. Note that partition values are 
> materialized into the template tuple.
> After IMPALA-9655, the ranges of a file could be scheduled to different 
> ScanNode instances when MT_DOP > 1. In the following sequence, there is an 
> illegal mem access of "heap-use-after-free", which could cause a crash.
> t0:
> Scanner of ScanNode-1 reads header of a large avro file.
> Scanner of ScanNode-2 reads header of a small avro file.
> Varlen memory of the template_tuple transfers to the corresponding ScanNode.
> t1:
> Scanner of ScanNode-1 reads content of the small avro file.
> Scanner of ScanNode-2 reads content of the large avro file.
> Scanner will reuse the template_tuple created by the header scanners [1]. So 
> RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
> t2:
> ScanNode-1 finishes first and closes (assuming no more files to read).
> Downstream consumer of ScanNode-2 will crash if accessing the partition 
> string values.
> [1] 
> [https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to