[
https://issues.apache.org/jira/browse/IMPALA-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900234#comment-17900234
]
Quanlong Huang commented on IMPALA-13564:
-----------------------------------------
[~liuyuan43] Thanks for reporting this! It seems a bug in late materialization.
To help us reproduce this issue, could you share the query options and table
schema? Column stats like max string length will also be helpful.
BTW, you can try setting query option parquet_late_materialization_threshold=-1
to disable late materialization.
> Exector crash in impala::DecodeValue<impala::StringValue> when select * from
> table which has hundreds string column
> -------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-13564
> URL: https://issues.apache.org/jira/browse/IMPALA-13564
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.3.0
> Reporter: LiuYuan
> Priority: Major
>
> when I select * from the table which has hundreds string column, I got a
> SIGSEGV.
> there is a gdb backtrace:
> {code:java}
> (gdb) bt
> #0 0x0000000001e9b90c in impala::DecodeValue<impala::StringValue>
> (decode_error=0x7f3a31d60b04, out_val=0x427ec435, idx=<optimized out>,
> dict_len=17, dict=0x17292780)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/bit-packing.inline.h:295
> #1 impala::BitPacking::UnpackAndDecode32Values<impala::StringValue, 5>
> (in=in@entry=0x1cc2b622 "\002\204\001\300", dict=0x17292780,
> dict_len=dict_len@entry=17, out=<optimized out>, stride=stride@entry=7049,
> decode_error=0x7f3a31d60b04,
> in_bytes=<optimized out>) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/bit-packing.inline.h:356
> #2 0x0000000001f5e7c1 in
> impala::BitPacking::UnpackAndDecodeValues<impala::StringValue, 5>
> (decode_error=0x7f3a31d60b04, stride=7049, out=0x42563774,
> num_values=<optimized out>, dict_len=17, dict=0x17292780, in_bytes=<optimized
> out>,
> in=<optimized out>) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/bit-packing.inline.h:145
> #3 impala::BitPacking::UnpackAndDecodeValues<impala::StringValue>
> (bit_width=<optimized out>, in=<optimized out>, in_bytes=12210,
> dict=dict@entry=0x17292780, dict_len=dict_len@entry=17,
> num_values=num_values@entry=480, out=0x42563774,
> stride=7049, decode_error=0x7f3a31d60b04) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/bit-packing.inline.h:124
> #4 0x0000000001f649fd in
> impala::BatchedBitReader::UnpackAndDecodeBatch<impala::StringValue>
> (stride=7049, v=0x42563774, num_values=<optimized out>, dict_len=<optimized
> out>, dict=0x17292780, bit_width=<optimized out>, this=<optimized out>)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/bit-stream-utils.inline.h:199
> #5 impala::RleBatchDecoder<unsigned
> int>::DecodeLiteralValues<impala::StringValue> (out=<synthetic pointer>,
> dict_len=<optimized out>, dict=0x17292780, num_literals_to_consume=504,
> this=<optimized out>)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/rle-encoding.h:628
> #6 impala::DictDecoder<impala::StringValue>::GetNextValues (count=520,
> stride=7049, first_value=0x42563774, this=0x20391450) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/dict-encoding.h:549
> #7 impala::ScalarColumnReader<impala::StringValue, (parquet::Type::type)6,
> true>::DecodeValues<(parquet::Encoding::type)2> (out_vals=0x42563774,
> count=<optimized out>, stride=7049, this=0x20391000)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:858
> #8 impala::ScalarColumnReader<impala::StringValue, (parquet::Type::type)6,
> true>::DecodeValues (out_vals=0x42563774, count=<optimized out>, stride=7049,
> this=0x20391000)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:846
> #9 impala::ScalarColumnReader<impala::StringValue, (parquet::Type::type)6,
> true>::ReadSlotsNoConversion (this=this@entry=0x20391000,
> num_to_read=<optimized out>, tuple_size=tuple_size@entry=7049,
> tuple_mem=<optimized out>)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:770
> #10 0x0000000001f653c1 in impala::ScalarColumnReader<impala::StringValue,
> (parquet::Type::type)6, true>::ReadSlots (tuple_mem=<optimized out>,
> tuple_size=7049, num_to_read=1024, this=0x20391000)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:735
> #11 impala::ScalarColumnReader<impala::StringValue, (parquet::Type::type)6,
> true>::MaterializeValueBatchRepeatedDefLevel (this=this@entry=0x20391000,
> max_values=max_values@entry=1024, tuple_size=tuple_size@entry=7049,
> tuple_mem=tuple_mem@entry=0x42200000 "@\247k\005",
> num_values=num_values@entry=0x7f3a31d60c20) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:663
> #12 0x0000000001f65dcc in impala::ScalarColumnReader<impala::StringValue,
> (parquet::Type::type)6, true>::ReadValueBatch<false> (this=0x20391000,
> max_values=1024, tuple_size=7049, tuple_mem=0x42200000 "@\247k\005",
> num_values=0x1f5858d0)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/parquet-column-readers.cc:496
> #13 0x0000000001db8f5b in impala::HdfsParquetScanner::FillScratchMicroBatches
> (this=0x1f5ad800, column_readers=..., row_batch=0x20890500,
> skip_row_group=0x1f5ada58, micro_batches=0x1f5adf04, num_micro_batches=1,
> max_num_tuples=595,
> num_tuples=<optimized out>) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/hdfs-parquet-scanner.cc:2508
> #14 0x0000000001dcdf34 in impala::HdfsParquetScanner::AssembleRows<false>
> (this=this@entry=0x1f5ad800, row_batch=row_batch@entry=0x20890500,
> skip_row_group=<optimized out>)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:103
> #15 0x0000000001dcb9d0 in impala::HdfsParquetScanner::GetNextInternal
> (this=0x1f5ad800, row_batch=0x20890500) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/hdfs-parquet-scanner.cc:532
> #16 0x0000000001dbc2ae in impala::HdfsParquetScanner::ProcessSplit
> (this=0x1f5ad800) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/parquet/hdfs-parquet-scanner.cc:416
> #17 0x0000000001d3cdd6 in impala::HdfsScanNode::ProcessSplit
> (this=0x18c3e000, filter_ctxs=..., expr_results_pool=<optimized out>,
> scan_range=0x1f04c180, scanner_thread_reservation=0x7f3a31d61378)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/hdfs-scan-node.cc:495
> #18 0x0000000001d3f41d in impala::HdfsScanNode::ScannerThread
> (this=0x18c3e000, first_thread=false, scanner_thread_reservation=<optimized
> out>) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/exec/hdfs-scan-node.cc:413
> #19 0x0000000001b6bd59 in boost::function0<void>::operator()
> (this=0x7f3a31d619d0) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
> #20 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=...,
> functor=..., parent_thread_info=0x7f3a3e27e750, thread_started=0x7f3a3e27dd30)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/be/src/util/thread.cc:360
> #21 0x0000000001b6cff1 in
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*>
> >::operator()<void (*)(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*),
> boost::_bi::list0>(boost::_bi::type<void>, void
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) (a=<synthetic
> pointer>...,
> f=@0x1e5657f8: 0x1b6ba20
> <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*)>, this=0x1e565800)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:531
> #22 boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> >
> >::operator()() (this=0x1e5657f8)
> at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:1294
> #23 boost::detail::thread_data<boost::_bi::bind_t<void, void
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*),
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >
> >::run() (
> this=0x1e5656c0) at
> /data/fuxi_ci_workspace/6713819dec1a06263634b15a/toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:120
> #24 0x000000000242e2c7 in thread_proxy ()
> #25 0x00007f3b8717c67a in ?? () from /usr/lib64/libc.so.6
> #26 0x00007f3b871ff160 in ?? () from /usr/lib64/libc.so.6
> #27 0x0000000000000000 in ?? () {code}
> I think there should be passed min(
> micro_batches[r].length, scratch_batch_->capacity
> ) instead of micro_batches[r].length, because micro_batches[r].length is
> 1024, but scratch_batch_->capacity is less than 1024 when row_size is bigger
> than 4096
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]