[ 
https://issues.apache.org/jira/browse/IMPALA-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6330.
-----------------------------------
    Resolution: Fixed

commit 604e48d2f3f86b18f7d588d1861e17f177dfedc9
Author: Philip Zeyliger <[email protected]>
Date:   Tue Dec 19 10:51:57 2017 -0800

    IMPALA-6330, IMPALA-5702: Avoid boost's trim() to workaround crash after 
dynamic linking.
    
    Replaces boost::algorithm::trim() with std::string methods when parsing
    /proc/self/smaps and adds a trivial unit test for MemInfo::ParseSmaps().
    
    I did *not* replace other uses of trim() with equivalents from
    be/src/gutil/strings/strip.h at this moment.
    
    The backstory here is that
    TestAdmissionControllerStress::test_admission_controller_with_flags
    fails occasionally on dynamically linked builds of Impala. I was able
    to reproduce the failure reliably (within 3 tries) with the following:
    
      $ ./buildall.sh -notests -so -noclean
      $ bin/start-impala-cluster.py  
--impalad_args="--memory_maintenance_sleep_time_ms=1"
      $ impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), 
avg(c4), avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) 
c1, avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over 
(order by smallint_col desc) c3, rank() over (order by int_col desc) c4, 
dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over 
(order by bigint_col desc) c6 from functional.alltypes) t;'
    
    The stack trace looks like:

    
      (gdb) bt
      #0  0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/
sysv/linux/raise.c:54
      #1  0x00007fe230df402a in __GI_abort () at abort.c:89
      #2  0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at .
./../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
      #3  0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) (handler=<op
timized out>) at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
      #4  0x00007fe2330d8bb1 in std::terminate() () at ../../../../gcc-4.9.2/lib
stdc++-v3/libsupc++/eh_terminate.cc:57
      #5  0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, 
void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for std::bad_cas
t>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>) at ../../../../gcc-4.9.2/l
ibstdc++-v3/libsupc++/eh_throw.cc:87
      #6  0x00007fe233110332 in std::__throw_bad_cast() () at ../../../../../gcc
-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63
      #7  0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale co
nst&) (__loc=...) at /data/jenkins/workspace/verify-impala-toolchain-package-bui
ld/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknow
n-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137
      #8  0x00000000008d2cdf in void boost::algorithm::trim<std::string>(std::st
ring&, std::locale const&) ()
      #9  0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at /home/philip
/src/Impala/be/src/util/mem-info.cc:132
      ...



    My best theory is that there's a race/bug, wherein the std::locale* static i
nitialization
    work is getting somehow 'reset' by the dynamic linker, when more libraries a
re linked
    in as a result of the query. My evidence to support this theory is scant, bu
t
    I do notice that LD_DEBUG=all prints the following when the query is execute
d
    (but not right at startup):
    
      binding file /home/philip/src/Impala/toolchain/gcc-4.9.2/lib64/libstdc++.s
o.6 [0] to
      /home/philip/src/Impala/toolchain/gflags-2.2.0-p1/lib/libgflags.so.2.2 [0]
:
      normal symbol `std::locale::facet::_S_destroy_c_locale(__locale_struct*&)'
    
    Note that there are BSS segments for some of std::locale::facet::* inside
    of libgflags.so.
    
      $nm toolchain/gflags-2.2.0-p1/lib/libgflags.so | c++filt | grep facet | gr
ep ' B '
      00000000002e2d10 B std::locale::facet::_S_c_locale
      00000000002e2d0c B std::locale::facet::_S_once
    
    I'm not the first to run into variants of these issues, though the results
    are fairly unhelpful:
    
      http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/faq.html
      https://stackoverflow.com/questions/26990412/c-boost-crashes-while-using-l
ocale
      https://svn.boost.org/trac10/ticket/4671
      http://clang-developers.42468.n3.nabble.com/std-use-facet-lt-std-ctype-lt-
char-gt-gt-crashes-on-linux-td4033967.html
      https://unix.stackexchange.com/questions/719/can-we-get-compiler-informati
on-from-an-elf-binary
      https://stackoverflow.com/questions/42376100/linking-with-library-causes-c
ollate-facet-to-be-missing-from-char
      http://lists.llvm.org/pipermail/cfe-dev/2012-July/023289.html
      https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html
    
    Change-Id: I8dd807f869a9359d991ba515177fb2298054520e
    Reviewed-on: http://gerrit.cloudera.org:8080/8888
    Reviewed-by: Philip Zeyliger <[email protected]>
    Tested-by: Impala Public Jenkins




> impalad crash with --memory_maintenance_sleep_time_ms=1
> -------------------------------------------------------
>
>                 Key: IMPALA-6330
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6330
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Philip Zeyliger
>            Priority: Major
>
> In a typical development environment (Ubuntu16.04), I'm seeing the following:
> {code}
> $bin/start-impala-cluster.py  
> --impalad_args="--memory_maintenance_sleep_time_ms=1"
> $impala-shell.sh --query 'select max(t.c1), avg(t.c2), min(t.c3), avg(c4), 
> avg(c5), avg(c6) from (select max(tinyint_col) over (order by int_col) c1, 
> avg(tinyint_col) over (order by smallint_col) c2, min(tinyint_col) over 
> (order by smallint_col desc) c3, rank() over (order by int_col desc) c4, 
> dense_rank() over (order by bigint_col) c5, first_value(tinyint_col) over 
> (order by bigint_col desc) c6 from functional.alltypes) t;'
> ...
> Error communicating with impalad: TSocket read 0 bytes
> ...
> # # CRASH!
> {code}
> I saw this originally in an atypical environment (Docker), and the bug is 
> adapted from {{tests/custom_cluster/test_mem_reservations.py}} failing in 
> that environment. I was able to get it to reproduce by tuning the timing.
> The stack trace I see is:
> {code}
> (gdb) bt
> #0  0x00007fe230df2428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x00007fe230df402a in __GI_abort () at abort.c:89
> #2  0x00007fe23312026d in __gnu_cxx::__verbose_terminate_handler() () at 
> ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/vterminate.cc:95
> #3  0x00007fe2330d8b66 in __cxxabiv1::__terminate(void (*)()) 
> (handler=<optimized out>) at 
> ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:47
> #4  0x00007fe2330d8bb1 in std::terminate() () at 
> ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_terminate.cc:57
> #5  0x00007fe2330d8cb8 in __cxxabiv1::__cxa_throw(void*, std::type_info*, 
> void (*)(void*)) (obj=0x8e54080, tinfo=0x7fe233356210 <typeinfo for 
> std::bad_cast>, dest=0x7fe23311ea70 <std::bad_cast::~bad_cast()>)
>     at ../../../../gcc-4.9.2/libstdc++-v3/libsupc++/eh_throw.cc:87
> #6  0x00007fe233110332 in std::__throw_bad_cast() () at 
> ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/functexcept.cc:63
> #7  0x00007fe2330e8ad7 in std::use_facet<std::ctype<char> >(std::locale 
> const&) (__loc=...)
>     at 
> /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.tcc:137
> #8  0x00000000008d2cdf in void 
> boost::algorithm::trim<std::string>(std::string&, std::locale const&) ()
> #9  0x00007fe2396d5057 in impala::MemInfo::ParseSmaps() () at 
> /home/philip/src/Impala/be/src/util/mem-info.cc:132
> #10 0x00007fe2396d74ce in impala::AggregateMemoryMetrics::Refresh() () at 
> /home/philip/src/Impala/be/src/util/memory-metrics.cc:141
> #11 0x00007fe239cea7c8 in MemoryMaintenanceThread() () at 
> /home/philip/src/Impala/be/src/common/init.cc:154
> #12 0x00007fe239cefd25 in 
> boost::detail::function::void_function_invoker0<void (*)(), 
> void>::invoke(boost::detail::function::function_buffer&) (function_ptr=...) 
> at 
> /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:112
> #13 0x00007fe2399c8122 in boost::function0<void>::operator()() const 
> (this=0x7fe1dc74cce0) at 
> /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> #14 0x00007fe2397555c1 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::Promise<long>*) 
> (name="memory-maintenance-thread", category="common", functor=..., 
> thread_started=0x7fffca2710a0)
>     at /home/philip/src/Impala/be/src/util/thread.cc:352
> #15 0x00007fe23975ed38 in boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list0&, int) (this=0x7987dc0, 
> f=@0x7987db8: 0x7fe2397552a2 <impala::Thread::SuperviseThread(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*)>, a=...) at 
> /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:457
> #16 0x00007fe23975ec7b in boost::_bi::bind_t<void, void (*)(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0x7987db8) 
> at 
> /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20
> #17 0x00007fe23975ec3e in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0x7987c00) at 
> /home/philip/src/Impala/toolchain/boost-1.57.0-p3/include/boost/thread/detail/thread.hpp:116
> #18 0x00000000008d059a in thread_proxy ()
> #19 0x00007fe23118e6ba in start_thread (arg=0x7fe1dc74d700) at 
> pthread_create.c:333
> #20 0x00007fe230ec43dd in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> {code}
> Hanging out with this in gdb, I don't think the problem is likely in 
> {{boost::trim()}} or with locales. If the problem were as simple as that, it 
> would have failed considerably more regularly. I've added a unit test for 
> ParseSmaps which has no trouble passing.
> I'm going fishing for it; wish me luck! My best guess is an interaction 
> between BufferPool::Maintenance() and the usage of those buffers. I'm going 
> to see if TSAN or ASAN builds help me out.
> [~tarmstrong], I assume you'll be curious about this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to