[ https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184749#comment-17184749 ]
ASF subversion and git services commented on IMPALA-9957: --------------------------------------------------------- Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Reviewed-on: http://gerrit.cloudera.org:8080/16240 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> > Impalad crashes when serializing large rows in aggregation spilling > ------------------------------------------------------------------- > > Key: IMPALA-9957 > URL: https://issues.apache.org/jira/browse/IMPALA-9957 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > Queries to reproduce the crash using the testdata: > {code:sql} > create table bigstrs stored as parquet as > select *, repeat(uuid(), cast(random() * 100000 as int)) as bigstr > from functional.alltypes; > set MAX_ROW_SIZE=3.5MB; > set MEM_LIMIT=4GB; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs group by bigstr; > {code} > The last query 1) has large rows, 2) needs spilling in aggregation 3) has > aggregation on functions needs serialize (e.g. group_concat, appx_median, > min(string), etc). With these 3 conditions, it will trigger this bug. > The crash stacktraces are different in different build modes. Crash > stacktrace in RELEASE build with codegen enabled: > {code:java} > Thread 316 (crashed) > 0 impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0] > 1 impalad!impala::GroupingAggregator::Partition::Spill(bool) > [grouping-aggregator-partition.cc : 180 + 0x9] > 2 impalad!impala::GroupingAggregator::SpillPartition(bool) > [grouping-aggregator.cc : 904 + 0x10] > 3 0x7f5fba83db3c > 4 impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, > impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2] > 5 impalad!impala::AggregationNode::Open(impala::RuntimeState*) > [aggregation-node.cc : 70 + 0x6] > 6 libstdc++.so.6.0.24 + 0x120b28 > 7 > impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&) > const [converter_lexical_streams.hpp : 161 + 0x8] > 8 impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc > : 396 + 0x11] > 9 impalad!tc_newarray + 0x171 > {code} > Crash stacktrace in RELEASE build with codegen disabled (set > DISABLE_CODEGEN=true): > {code:java} > Thread 320 (crashed) > 0 impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0] > 1 impalad!impala::GroupingAggregator::Partition::Spill(bool) > [grouping-aggregator-partition.cc : 180 + 0x9] > 2 impalad!impala::GroupingAggregator::SpillPartition(bool) > [grouping-aggregator.cc : 904 + 0x10] > 3 impalad!impala::Status > impala::GroupingAggregator::AddBatchImpl<false>(impala::RowBatch*, > impala::TPrefetchMode::type, impala::HashTableCtx*) > [grouping-aggregator-ir.cc : 148 + 0x11] > 4 impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, > impala::RowBatch*) [grouping-aggregator.cc : 439 + 0x5] > 5 impalad!impala::AggregationNode::Open(impala::RuntimeState*) > [aggregation-node.cc : 70 + 0x6] > 6 impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc > : 396 + 0x11] > 7 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc > : 97 + 0x12] > 8 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) > [query-state.cc : 815 + 0x19] > 9 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 770 > + 0x7] > 10 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, boost::function<void > ()>, impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>*), > boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > >, > boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > > >::run() [bind.hpp : 531 + 0xc] > 11 impalad!thread_proxy + 0x72 > 12 libpthread-2.23.so + 0x76ba > 13 libc-2.23.so + 0x1074dd > {code} > Crash stacktrace in DEBUG build with codegen disabled is a bit ealier - > crashed at a DCHECK: > {code:java} > F0715 20:29:24.389505 16868 grouping-aggregator-partition.cc:125] > 1d4b40df02e6ad76:433ed57400000003] Check failed: !status.ok() Stream was > unpinned - AddRow() only fails on error > *** Check failure stack trace: *** > @ 0x513f31c google::LogMessage::Fail() > @ 0x5140c0c google::LogMessage::SendToLog() > @ 0x513ec7a google::LogMessage::Flush() > @ 0x5142878 google::LogMessageFatal::~LogMessageFatal() > @ 0x28b2ca7 > impala::GroupingAggregator::Partition::SerializeStreamForSpilling() > @ 0x28b360f impala::GroupingAggregator::Partition::Spill() > @ 0x28a4122 impala::GroupingAggregator::SpillPartition() > @ 0x28b169c impala::GroupingAggregator::AddIntermediateTuple<>() > @ 0x28b09d9 impala::GroupingAggregator::ProcessRow<>() > @ 0x28af535 impala::GroupingAggregator::AddBatchImpl<>() > @ 0x289f6ad impala::GroupingAggregator::AddBatch() > @ 0x28db463 impala::AggregationNode::Open() > @ 0x22598bd impala::FragmentInstanceState::Open() > @ 0x22562f4 impala::FragmentInstanceState::Exec() > @ 0x22801ed impala::QueryState::ExecFInstance() > @ 0x227e5ef > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2281d8e > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x204d7d1 boost::function0<>::operator()() > @ 0x26702d5 impala::Thread::SuperviseThread() > @ 0x2678272 boost::_bi::list5<>::operator()<>() > @ 0x2678196 boost::_bi::bind_t<>::operator()() > @ 0x2678157 boost::detail::thread_data<>::run() > @ 0x3e45d71 thread_proxy > @ 0x7ff1bfdc46b9 start_thread > @ 0x7ff1bc98f4dc clone > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org