I've run today with a similar patch and it (along with the MESOS-190 fix) addresses my segfault issues. Before I would get 5+ per day; today has been core file free!
On Wed, May 9, 2012 at 2:47 PM, Benjamin Hindman <[email protected]> wrote: > I've committed a fix in r1336417. Please let me know if this fixes the > problem or if more needs to be done. Thank you! > > > On Wed, May 9, 2012 at 1:46 PM, Benjamin Hindman > <[email protected]>wrote: > >> Yes, this looks like it should be the case. :( >> >> I'll fix this bug ASAP. Thanks for reporting! >> >> >> >> On Wed, May 9, 2012 at 8:56 AM, Scott Smith <[email protected]> wrote: >> >>> I've had numerous other segfaults in libprocess, mostly in >>> std::map/rbtree code. Is it possible that SocketManager::accepted is >>> missing a synchronized(this) {} block? >>> >>> from process.cpp: >>> >>> Socket SocketManager::accepted(int s) >>> { >>> return sockets[s] = Socket(s); >>> } >>> >>> On Mon, May 7, 2012 at 11:40 PM, Scott Smith <[email protected]> >>> wrote: >>> > I've encountered another segfault in the slave. This time, nothing >>> > unusual was happening. Single framework / single user. Four slaves, >>> > one master, framework run from master. >>> > >>> > version: >>> > svn Revision: 1334534 + proposed fix for MESOS-190: >>> > https://reviews.apache.org/r/5057/diff/2/#index_header >>> > >>> > log messages: >>> > I0508 06:35:21.458798 828 slave.cpp:447] Got assigned task 8:864:0 >>> > for framework 201205080535222558218-5050-29475-0004 >>> > I0508 06:35:21.459225 829 slave.cpp:689] Got acknowledgement of >>> > status update for task 8:863:0 of framework >>> > 201205080535222558218-5050-29475-0004 >>> > F0508 06:35:21.459432 832 process.cpp:1772] Check failed: >>> > sockets.count(s) > 0 >>> > >>> > stack trace: >>> > #0 0x00007f0aecdf0445 in raise () from /lib/x86_64-linux-gnu/libc.so.6 >>> > #1 0x00007f0aecdf3bab in abort () from /lib/x86_64-linux-gnu/libc.so.6 >>> > #2 0x00007f0aedd65dd9 in google::DumpStackTraceAndExit () at >>> > src/utilities.cc:145 >>> > #3 0x00007f0aedd5ed9d in google::LogMessage::Fail () at >>> src/logging.cc:1256 >>> > #4 0x00007f0aedd6152f in google::LogMessage::SendToLog >>> (this=0x7f0ae8a71c60) >>> > at src/logging.cc:1216 >>> > #5 0x00007f0aedd5e99b in google::LogMessage::Flush >>> (this=0x7f0ae8a71c60) >>> > at src/logging.cc:1088 >>> > #6 0x00007f0aedd61dbd in google::LogMessageFatal::~LogMessageFatal ( >>> > this=0x7f0ae8a71c60, __in_chrg=<optimized out>) at >>> src/logging.cc:1777 >>> > #7 0x00007f0aedc93a55 in process::SocketManager::next(int) () >>> > from /home/ubuntu/cr/lib/libmesos-0.9.0.so >>> > #8 0x00007f0aedc8e119 in process::send_data(ev_loop*, ev_io*, int) () >>> > from /home/ubuntu/cr/lib/libmesos-0.9.0.so >>> > #9 0x00007f0aedd9e6ef in ev_invoke_pending (loop=0x7f0aee119240) at >>> ev.c:1971 >>> > #10 0x00007f0aedda2a24 in ev_loop (loop=0x7f0aee119240, >>> flags=<optimized out>) >>> > at ev.c:2333 >>> > #11 0x00007f0aedc8f30d in process::serve(void*) () >>> > from /home/ubuntu/cr/lib/libmesos-0.9.0.so >>> > #12 0x00007f0aed17ee9a in start_thread () from >>> > /lib/x86_64-linux-gnu/libpthread.so.0 >>> > #13 0x00007f0aeceac4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 >>> > #14 0x0000000000000000 in ?? () >>> > >>> > -- >>> > Scott >>> >>> >>> >>> -- >>> Scott >>> >> >> -- Scott
