[
https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146773#comment-16146773
]
Benjamin Hindman edited comment on MESOS-7921 at 8/30/17 6:55 AM:
------------------------------------------------------------------
Added some extra logging and checks to try and suss this out, see
https://github.com/apache/mesos/commit/f7ab29936fb1bbca3c3812cc5bb274aaca10b298.
[~xujyan], if you're able to trigger something sooner than the ASF CI build,
can you let us know if you get any of these new warning log messages or
{{CHECK}} failures? Thank you!
was (Author: benjaminhindman):
Added some extra logging and checks to try and suss this out, see
https://github.com/apache/mesos/commit/f7ab29936fb1bbca3c3812cc5bb274aaca10b298.
> process::EventQueue sometimes crashes
> -------------------------------------
>
> Key: MESOS-7921
> URL: https://issues.apache.org/jira/browse/MESOS-7921
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Affects Versions: 1.4.0
> Environment: autotools,gcc,--verbose,GLOG_v=1
> MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)
> Note that --enable-lock-free-event-queue is not enabled.
> Details:
> https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/injectedEnvVars/
> Reporter: Yan Xu
> Priority: Blocker
> Attachments:
> MesosContainerizerSlaveRecoveryTest.ResourceStatisticsFullLog.txt
>
>
> The following segfault is found on
> [ASF|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)/4159/]
> in {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} but it's flaky
> and shows up in other tests and environments (with or without
> --enable-lock-free-event-queue) as well.
> {noformat: title=Configuration}
> ./bootstrap '&&' ./configure --verbose '&&' make -j6 distcheck
> {noformat}
> {noformat:title=}
> *** Aborted at 1503937885 (unix time) try "date -d @1503937885" if you are
> using GNU date ***
> PC: @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty()
> *** SIGSEGV (@0x8) received by PID 751 (TID 0x2b9e31978700) from PID 8; stack
> trace: ***
> @ 0x2b9e29d26330 (unknown)
> @ 0x2b9e2581caa0 process::EventQueue::Consumer::empty()
> @ 0x2b9e25800a40 process::ProcessManager::resume()
> @ 0x2b9e2580f891
> process::ProcessManager::init_threads()::$_9::operator()()
> @ 0x2b9e2580f7d5
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_9vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x2b9e2580f7a5 std::_Bind_simple<>::operator()()
> @ 0x2b9e2580f77c std::thread::_Impl<>::_M_run()
> @ 0x2b9e29fe5a60 (unknown)
> @ 0x2b9e29d1e184 start_thread
> @ 0x2b9e2a851ffd (unknown)
> make[3]: *** [CMakeFiles/check] Segmentation fault (core dumped)
> {noformat}
> A [email protected] query shows many such instances:
> https://lists.apache.org/[email protected]:lte=1M:process%3A%3AEventQueue%3A%3AConsumer%3A%3Aempty
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)