[ 
https://issues.apache.org/jira/browse/IMPALA-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662564#comment-16662564
 ] 

Tim Armstrong commented on IMPALA-7714:
---------------------------------------

The main thread is in the signal handler at the time of the crash:
{noformat}
Thread 0
 0  libc-2.23.so + 0xf72dd
    rax = 0x0000000000000101   rdx = 0x0000000000000101
    rcx = 0x00007f111d4712dd   rbx = 0x0000000000000101
    rsi = 0x0000000005208000   rdi = 0x0000000000000003
    rbp = 0x0000000005208000   rsp = 0x00007fff326f3040
     r8 = 0x0000000000034600    r9 = 0x0000000004952000
    r10 = 0x00007f11078f3140   r11 = 0x0000000000000246
    r12 = 0x0000000000000101   r13 = 0x0000000000000001
    r14 = 0x0000000005122900   r15 = 0x000000000000007f
    rip = 0x00007f111d4712dd
    Found by: given as instruction pointer in context
 1  libc-2.23.so + 0x78bff
    rsp = 0x00007fff326f3050   rip = 0x00007f111d3f2bff
    Found by: stack scanning
 2  libc-2.23.so + 0x7a409
    rsp = 0x00007fff326f3080   rip = 0x00007f111d3f4409
    Found by: stack scanning
 3  libc-2.23.so + 0x7c196
    rsp = 0x00007fff326f30b0   rip = 0x00007f111d3f6196
    Found by: stack scanning
 4  libc-2.23.so + 0x7c32a
    rsp = 0x00007fff326f3110   rip = 0x00007f111d3f632a
    Found by: stack scanning
 5  libc-2.23.so + 0x39f9b
    rsp = 0x00007fff326f3140   rip = 0x00007f111d3b3f9b
    Found by: stack scanning
 6  libc-2.23.so + 0x3a045
    rsp = 0x00007fff326f3170   rip = 0x00007f111d3b4045
    Found by: stack scanning
 7  impalad!HandleSigTerm [init.cc : 188 + 0x7]
    rsp = 0x00007fff326f3180   rip = 0x0000000000adaaf8
    Found by: stack scanning
 8  libpthread-2.23.so + 0x11390
    rsp = 0x00007fff326f31c0   rip = 0x00007f111d755390
    Found by: stack scanning
 9  libpthread-2.23.so + 0xd360
    rsp = 0x00007fff326f3260   rip = 0x00007f111d751360
    Found by: stack scanning
10  libpthread-2.23.so + 0xd360
    rsp = 0x00007fff326f3270   rip = 0x00007f111d751360
    Found by: stack scanning
11  impalad!boost::this_thread::interruption_requested() + 0x40
    rsp = 0x00007fff326f32c0   rip = 0x00000000016a1660
    Found by: stack scanning
12  impalad + 0xc2c7a0
    rsp = 0x00007fff326f3330   rip = 0x000000000102c7a0
    Found by: stack scanning
13  impalad!boost::thread::start_thread_noexcept() + 0x6a
    rsp = 0x00007fff326f3370   rip = 0x00000000016a0b1a
    Found by: stack scanning
14  0xffff00001fa0
    rbx = 0x00007f111df3718e   rbp = 0x00007fff326f4538
    rsp = 0x00007fff326f33a0   rip = 0x0000ffff00001fa0
    Found by: call frame info
15  impalad!impala::ImpalaServer::RaiseBeeswaxException(std::string const&, 
char const*) [impala-beeswax-server.cc : 486 + 0x8]
    rbp = 0x00007fff326f4538   rsp = 0x00007fff326f3470
    rip = 0x0000000000ff0000
    Found by: stack scanning
16  0x4dd00000000
    rbx = 0x000000000523dea0   rbp = 0x0000000000000000
    rsp = 0x00007fff326f4548   r12 = 0x00000000051ad478
    r13 = 0x00000000051ad2c0   r14 = 0x00000000051ad4a0
    r15 = 0x000000000523dea0   rip = 0x000004dd00000000
    Found by: call frame info
17  ld-2.23.so + 0xfac6
    rsp = 0x00007fff326f45b0   rip = 0x00007f1120ffeac6
    Found by: stack scanning
18  impalad!char* std::string::_S_construct<char const*>(char const*, char 
const*, std::allocator<char> const&, std::forward_iterator_tag) + 0x50
    rsp = 0x00007fff326f46f0   rip = 0x00000000016a8d50
    Found by: stack scanning
19  0x7fff326f4780
    rbx = 0x000000000517331c   rbp = 0x00007fff326f47c0
    rsp = 0x00007fff326f4710   r12 = 0x00007fff326f47c0
    rip = 0x00007fff326f4780
    Found by: call frame info
20  impalad!main [daemon-main.cc : 33 + 0xb]
    rbp = 0x00007fff326f47c0   rsp = 0x00007fff326f4740
    rip = 0x0000000000a4d84d
    Found by: stack scanning
21  impalad!__libc_csu_init + 0x4d
    rbx = 0x00007f111df1e0b0   rbp = 0x00000000051e9de8
    rsp = 0x00007fff326f47d0   r12 = 0x00000000051732d8
    r13 = 0x000000001d3b4299   r14 = 0x00000000051e9de8
    r15 = 0x00000000026fba3d   rip = 0x00000000028d21cd
    Found by: call frame info
22  libc-2.23.so + 0x20830
    rbx = 0x0000000000000000   rbp = 0x0000000000aa8740
    rsp = 0x00007fff326f4810   r12 = 0x00007fff326f48e0
    r13 = 0x0000000000000000   r14 = 0x0000000000000000
    r15 = 0x00000000028d2180   rip = 0x00007f111d39a830
    Found by: call frame info
23  impalad!ldap_int_destroy_global_options + 0x80
    rsp = 0x00007fff326f4830   rip = 0x0000000000a4d7d0
    Found by: stack scanning
24  impalad!_GLOBAL__sub_I_json_escaping.cc + 0x30
    rsp = 0x00007fff326f4848   rip = 0x0000000000aa8740
    Found by: stack scanning
25  impalad!__libc_csu_init + 0x70
    rsp = 0x00007fff326f4890   rip = 0x00000000028d21f0
    Found by: stack scanning
26  ld-2.23.so + 0x10ab0
    rsp = 0x00007fff326f4898   rip = 0x00007f1120fffab0
    Found by: stack scanning
27  ld-2.23.so + 0x107cb
    rsp = 0x00007fff326f48a0   rip = 0x00007f1120fff7cb
    Found by: stack scanning
28  impalad!_GLOBAL__sub_I_json_escaping.cc + 0x30
    rsp = 0x00007fff326f48b8   rip = 0x0000000000aa8740
    Found by: stack scanning
29  impalad!_start + 0x29
    rsp = 0x00007fff326f48d0   rip = 0x0000000000aa8769
    Found by: stack scanning
30  0x7fff326f48d8
    rsp = 0x00007fff326f48d8   rip = 0x00007fff326f48d8
    Found by: call frame info
{noformat}

This thread is the one that owns the Statestore object, which ultimately owns 
the data structure that is being dereferenced by the crashing thread. I wonder 
if there's some race there.

> Statestore::Subscriber::SetLastTopicVersionProcessed() crashed in 
> AtomicInt64::Store()
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7714
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7714
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Michael Ho
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build
>         Attachments: dbfd9687-09a9-4ab0-dcd7128b-41a2c5b3.dmp.resolved
>
>
> When running one of the customer cluster tests, 
> {{Statestore::Subscriber::SetLastTopicVersionProcessed()}} most likely 
> crashed at the following line. It could be a race or something but I didn't 
> have time to dig more into it.
> {noformat}
> void Statestore::Subscriber::SetLastTopicVersionProcessed(const TopicId& 
> topic_id,
>     TopicEntry::Version version) {
>   // Safe to call concurrently for different topics because 
> 'subscribed_topics' is not
>   // modified.
>   Topics* subscribed_topics = GetTopicsMapForId(topic_id);
>   Topics::iterator topic_it = subscribed_topics->find(topic_id);
>   DCHECK(topic_it != subscribed_topics->end());
>   topic_it->second.last_version.Store(version); <<-----
> }
> {noformat}
> {noformat}
> Error Message
> Minidump generated: 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/logs/custom_cluster_tests/minidumps/statestored/336d9ca9-88dc-4360-6a5adf97-936db5c0.dmp
> Standard Error
> Operating system: Linux
>                   0.0.0 Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 
> 20:32:50 UTC 2017 x86_64
> CPU: amd64
>      family 6 model 85 stepping 4
>      1 CPU
> GPU: UNKNOWN
> Crash reason:  SIGSEGV
> Crash address: 0x28
> Process uptime: not available
> Thread 18 (crashed)
>  0  
> impalad!impala::Statestore::Subscriber::SetLastTopicVersionProcessed(std::string
>  const&, long) [atomicops-internals-x86.h : 300 + 0x0]
>     rax = 0x0000000000000000   rdx = 0xc34174ed00000000
>     rcx = 0x0022c65a25a97b5b   rbx = 0x0000000004624e38
>     rsi = 0x0000000000000070   rdi = 0x0000000004906a79
>     rbp = 0x00007fd582d81320   rsp = 0x00007fd582d812e0
>      r8 = 0x000000009e3779b9    r9 = 0x0000000000000000
>     r10 = 0x0000000000000000   r11 = 0x00007fd58da31a90
>     r12 = 0x83bfbe948682e9da   r13 = 0x0000000004593e20
>     r14 = 0x000000000000000f   r15 = 0x000000000000000a
>     rip = 0x0000000001022a65
>     Found by: given as instruction pointer in context
>  1  
> impalad!impala::Statestore::SendTopicUpdate(impala::Statestore::Subscriber*, 
> impala::Statestore::UpdateKind, bool*) [statestore.cc : 704 + 0x12]
>     rbx = 0x00007fd582d813d0   rbp = 0x00007fd582d81580
>     rsp = 0x00007fd582d81330   r12 = 0x0000000004593e00
>     r13 = 0x0000000004624dd0   r14 = 0x00007fd582d81508
>     r15 = 0x00007fd582d814f0   rip = 0x00000000010283da
>     Found by: call frame info
>  2  
> impalad!impala::Statestore::DoSubscriberUpdate(impala::Statestore::UpdateKind,
>  int, impala::Statestore::ScheduledSubscriberUpdate const&) [statestore.cc : 
> 933 + 0x23]
>     rbx = 0x0000000000000000   rbp = 0x00007fd582d817d0
>     rsp = 0x00007fd582d81590   r12 = 0x00007fd582d81840
>     r13 = 0x20c49ba5e353f7cf   r14 = 0x000001667beb277f
>     r15 = 0x00007ffc38ca1080   rip = 0x0000000001029064
>     Found by: call frame info
>  3  
> impalad!impala::ThreadPool<impala::Statestore::ScheduledSubscriberUpdate>::WorkerThread(int)
>  [function_template.hpp : 767 + 0x10]
>     rbx = 0x00007ffc38ca1500   rbp = 0x00007fd582d818a0
>     rsp = 0x00007fd582d817e0   r12 = 0x00007ffc38ca1720
>     r13 = 0x00007fd582d81830   r14 = 0x00007fd582d81840
>     r15 = 0x0000000000000000   rip = 0x0000000001030bdc
>     Found by: call frame info
>  4  impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 767 
> + 0x7]
>     rbx = 0x00007fd582d81980   rbp = 0x00007fd582d81bf0
>     rsp = 0x00007fd582d818b0   r12 = 0x0000000000000000
>     r13 = 0x0000000004658300   r14 = 0x00007fd58e6af6a0
>     r15 = 0x00007ffc38ca07a0   rip = 0x00000000010fec72
>     Found by: call frame info
>  5  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void 
> (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() [bind.hpp : 525 + 0x6]
>     rbx = 0x00000000045f0600   rbp = 0x00007fd582d81c50
>     rsp = 0x00007fd582d81c00   r12 = 0x00007fd582d81c10
>     r13 = 0x00000000010fe980   r14 = 0x00007fd582d82700
>     r15 = 0x00007fd58e6af6a0   rip = 0x00000000010ff7ba
>     Found by: call frame info
>  6  impalad!thread_proxy + 0xda
>     rbx = 0x0000000000000000   rbp = 0x0000000000000000
>     rsp = 0x00007fd582d81c60   r12 = 0x0000000000000000
>     r13 = 0x00007fd582d829c0   r14 = 0x00007fd582d82700
>     r15 = 0x00007fd58e6af6a0   rip = 0x00000000016a06fa
>     Found by: call frame info
>  7  libpthread-2.17.so + 0x7e25
>     rbx = 0x0000000000000000   rbp = 0x0000000000000000
>     rsp = 0x00007fd582d81ca0   r12 = 0x0000000000000000
>     r13 = 0x00007fd582d829c0   r14 = 0x00007fd582d82700
>     r15 = 0x00007fd58e6af6a0   rip = 0x00007fd58dc78e25
>     Found by: call frame info
>  8  libc-2.17.so + 0xf834d
>     rsp = 0x00007fd582d81d40   rip = 0x00007fd58d9a634d
>     Found by: stack scanning
> Thread 0
>  0  libjvm.so + 0xa7aa0f
>     rax = 0x00007fd5910e94c0   rdx = 0x00007fd590c049f0
>     rcx = 0x0000000000000003   rbx = 0x00007fd591169f50
>     rsi = 0x0000000000000000   rdi = 0x00007fd591169ee0
>     rbp = 0x00007ffc38c9fbb0   rsp = 0x00007ffc38c9fba0
>      r8 = 0x0000000000030878    r9 = 0x0000000003ddd000
>     r10 = 0x00007ffc38c9efa0   r11 = 0x00000000028d1ab0
>     r12 = 0x00000000045b4d10   r13 = 0x0000000000000000
>     r14 = 0x00000000045b4d00   r15 = 0x00000000000007f1
>     rip = 0x00007fd590c04a0f
>     Found by: given as instruction pointer in context
>  1  libc-2.17.so + 0x38dda
>     rsp = 0x00007ffc38c9fbc0   rip = 0x00007fd58d8e6dda
>     Found by: stack scanning
>  2  libjvm.so + 0x220066
>     rsp = 0x00007ffc38c9fc00   rip = 0x00007fd5903aa066
>     Found by: stack scanning
>  3  libjvm.so + 0xafae51
>     rsp = 0x00007ffc38c9fc20   rip = 0x00007fd590c84e51
>     Found by: stack scanning
>  4  ld-2.17.so + 0xfb58
>     rsp = 0x00007ffc38c9fc30   rip = 0x00007fd5915b0b58
>     Found by: stack scanning
>  5  ld-2.17.so + 0xf9fd
>     rsp = 0x00007ffc38c9fd50   rip = 0x00007fd5915b09fd
>     Found by: stack scanning
>  6  libc-2.17.so + 0x38a69
>     rsp = 0x00007ffc38c9fdc0   rip = 0x00007fd58d8e6a69
>     Found by: stack scanning
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to