[jira] [Comment Edited] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

Yida Wu (Jira) Wed, 30 Sep 2020 17:14:41 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203998#comment-17203998
 ]


Yida Wu edited comment on IMPALA-10102 at 10/1/20, 12:13 AM:
-------------------------------------------------------------

The crash happens occasionally with the below settings.
 1. modify test script in 
testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
{quote}set mem_limit="{color:#ff0000}8gb{color}";
 create table bigstrs3 stored as parquet as
 select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
 from functional.alltypes limit {color:#ff0000}1000{color};
{quote}
2. start impala-cluster:
 $IMPALA_HOME//bin/start-impala-cluster.py 
---impalad_args="--mt_dop_auto_fallback=true"
 3. run test:
 impala-py.test tests/query_test/test_spilling.py -k large

The reason of the crash:
 hdfs-parquet-table-writer.cc:
{quote}uint8_t* {color:#ff0000}compressed_data{color} =
 parent_->per_file_mem_pool_->Allocate(max_compressed_size);
{quote}
The compressed_data allocated failed due to no enough space left in the memory 
pool. However the code doesn't verify it and lead to . To avoid the crash, the 
simple way is to verify the returned object after Allocate is called, return an 
error status if it is null.

Also, there is another issue I have met when I test this case, which is one of 
the impalad processes is killed by linux due to OOM. I assume it happens due to 
the lack of memory in my local box because of the setting of high memory limit 
(8gb) for each process, however, it looks like a configuration issue more than 
a bug in the system. If the memory limit is set extremely high compared to the 
memory capacity , inevitably OOM could happen when it uses up all the memory.

So I am thinking if adding a verification on the Allocate function is enough 
for this Jira if it solves the problem. [~arawat] [~stigahuang]


was (Author: baggio000):
The crash happens occasionally with the below settings.
1. modify test script in 
testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
{quote}
set mem_limit="{color:#FF0000}8gb{color}";
create table bigstrs3 stored as parquet as
select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
from functional.alltypes limit {color:#FF0000}1000{color};
{quote}
2. start impala-cluster:
$IMPALA_HOME//bin/start-impala-cluster.py 
--impalad_args="--mt_dop_auto_fallback=true"
3. run test:
impala-py.test tests/query_test/test_spilling.py -k large

The reason of the crash:
hdfs-parquet-table-writer.cc:
{quote}uint8_t* {color:#FF0000}compressed_data{color} =
 parent_->per_file_mem_pool_->Allocate(max_compressed_size);
{quote}
The compressed_data allocated failed due to no enough space left in the memory 
pool. However the code doesn't verify it and lead to . To avoid the crash, the 
simple way is to verify the returned object after Allocate is called, return an 
error status if it is null.

Also, there is another issue I have met when I test this case, which is one of 
the impalad processes is killed by linux due to OOM. I assume it happens due to 
the lack of memory in my local box because of the setting of high memory limit 
(8gb) for each process, however, it looks like a configuration issue more than 
a bug in the system. If the memory limit is set extremely high compared to the 
memory capacity , inevitably OOM could happen when it uses up all the memory.

So I am thinking if adding a verification on the Allocate function is enough 
for this Jira if it solves the problem. [~arawat] [~stigahuang]

> Impalad crashses when writting a parquet file with large rows
> -------------------------------------------------------------
>
>                 Key: IMPALA-10102
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10102
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Yida Wu
>            Priority: Critical
>              Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 200000 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair<std::unique_ptr<impala::OutputPartition, 
> std::default_delete<impala::OutputPartition> >, std::vector<int, 
> std::allocator<int> > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::<lambda()>,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0<void>::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15]
> 20  impalad!boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() [bind.hpp : 1222 + 0x22]
> 21  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() [thread.hpp : 116 + 0x12]
> 22  impalad!thread_proxy + 0x72 
> 23  libpthread-2.23.so + 0x76ba
> 24  libc-2.23.so + 0x1074dd
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

Reply via email to