[ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203998#comment-17203998
 ] 

Yida Wu edited comment on IMPALA-10102 at 10/1/20, 12:16 AM:
-------------------------------------------------------------

The crash happens occasionally with the below settings.
 1. modify test script in 
testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
{quote}set mem_limit="{color:#ff0000}8gb{color}";
 create table bigstrs3 stored as parquet as
 select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
 from functional.alltypes limit {color:#ff0000}1000{color};
{quote}
2. start impala-cluster:
 $IMPALA_HOME//bin/start-impala-cluster.py 
---impalad_args="--mt_dop_auto_fallback=true"
 3. run test:
 impala-py.test tests/query_test/test_spilling.py -k large

The reason of the crash:
 hdfs-parquet-table-writer.cc:
{quote}uint8_t* {color:#ff0000}compressed_data{color} =
 parent_->per_file_mem_pool_->Allocate(max_compressed_size);
{quote}
The compressed_data allocated failed due to no enough space left in the memory 
pool. However the code doesn't verify it and lead to the crash. To avoid the 
crash, the simple way is to verify the returned object after Allocate is 
called, return an error status if it is null.

Also, there is another issue I have met when I test this case, which is one of 
the impalad processes is killed by linux due to OOM. I assume it happens due to 
the lack of memory in my local box because of the setting of high memory limit 
(8gb) for each process, however, it looks like a configuration issue more than 
a bug in the system. If the memory limit is set extremely high compared to the 
memory capacity , inevitably OOM could happen when it uses up all the memory.

So I am thinking if adding a verification on the Allocate function is enough 
for this Jira if it solves the problem. [~arawat] [~stigahuang]


was (Author: baggio000):
The crash happens occasionally with the below settings.
 1. modify test script in 
testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
{quote}set mem_limit="{color:#ff0000}8gb{color}";
 create table bigstrs3 stored as parquet as
 select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
 from functional.alltypes limit {color:#ff0000}1000{color};
{quote}
2. start impala-cluster:
 $IMPALA_HOME//bin/start-impala-cluster.py 
---impalad_args="--mt_dop_auto_fallback=true"
 3. run test:
 impala-py.test tests/query_test/test_spilling.py -k large

The reason of the crash:
 hdfs-parquet-table-writer.cc:
{quote}uint8_t* {color:#ff0000}compressed_data{color} =
 parent_->per_file_mem_pool_->Allocate(max_compressed_size);
{quote}
The compressed_data allocated failed due to no enough space left in the memory 
pool. However the code doesn't verify it and lead to . To avoid the crash, the 
simple way is to verify the returned object after Allocate is called, return an 
error status if it is null.

Also, there is another issue I have met when I test this case, which is one of 
the impalad processes is killed by linux due to OOM. I assume it happens due to 
the lack of memory in my local box because of the setting of high memory limit 
(8gb) for each process, however, it looks like a configuration issue more than 
a bug in the system. If the memory limit is set extremely high compared to the 
memory capacity , inevitably OOM could happen when it uses up all the memory.

So I am thinking if adding a verification on the Allocate function is enough 
for this Jira if it solves the problem. [~arawat] [~stigahuang]

> Impalad crashses when writting a parquet file with large rows
> -------------------------------------------------------------
>
>                 Key: IMPALA-10102
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10102
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Yida Wu
>            Priority: Critical
>              Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair<std::unique_ptr<impala::OutputPartition, 
> std::default_delete<impala::OutputPartition> >, std::vector<int, 
> std::allocator<int> > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::<lambda()>,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0<void>::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15]
> 20  impalad!boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() [bind.hpp : 1222 + 0x22]
> 21  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() [thread.hpp : 116 + 0x12]
> 22  impalad!thread_proxy + 0x72 
> 23  libpthread-2.23.so + 0x76ba
> 24  libc-2.23.so + 0x1074dd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to