> On April 25, 2013, 8:07 p.m., Benjamin Hindman wrote: > > Can you elaborate on why AtMost(1) was not sufficient? > > Thomas Marshall wrote: > I honestly have no idea, although I just realized that the JIRA issue I > linked to is actually referring to a different problem. That issue (timing > out on waiting for the registered future) is fixed by the recent change that > extended the amount of time that we wait on futures. The problem being fixed > by this patch looks something more like: > > ... > I0425 13:37:03.516690 30336 hierarchical_allocator_process.hpp:423] > Removed slave 201304251337-16842879-37747-30328-0 > I0425 13:37:03.516121 30334 slave.cpp:486] Slave asked to shut down by > [email protected]:37747 > I0425 13:37:03.517014 30334 slave.cpp:1099] Asked to shut down framework > 201304251337-16842879-37747-30328-0000 by [email protected]:37747 > W0425 13:37:03.517163 30334 slave.cpp:1120] Ignoring shutdown framework > 201304251337-16842879-37747-30328-0000 because it is terminating > I0425 13:37:03.518982 30334 slave.cpp:1867] [email protected]:37747 exited > [Thread 0x7fffb0cff700 (LWP 30378) exited] > W0425 13:37:03.519126 30334 slave.cpp:1870] Master disconnected! Waiting > for a new master to be elected > [Thread 0x7fffabfff700 (LWP 30379) exited] > I0425 13:37:03.521286 30328 slave.cpp:441] Slave terminating > I0425 13:37:03.521467 30328 slave.cpp:1099] Asked to shut down framework > 201304251337-16842879-37747-30328-0000 by @0.0.0.0:0 > W0425 13:37:03.521649 30328 slave.cpp:1120] Ignoring shutdown framework > 201304251337-16842879-37747-30328-0000 because it is terminating > [Thread 0x7fffb1d01700 (LWP 30374) exited] > [Thread 0x7fffab7fe700 (LWP 30376) exited] > > Program received signal SIGPIPE, Broken pipe. > [Switching to Thread 0x7fffb230e700 (LWP 30370)] > 0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0 > (gdb) where > #0 0x00007ffff6724ccd in write () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007fffb2316ab2 in Java_sun_nio_ch_FileDispatcherImpl_write0 () > from /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/libnio.so > #2 0x00007fffcc39ef90 in ?? () > #3 0x0000000000000000 in ?? () > > That error is occurring somewhere deep down inside zookeeper, and I don't > really know what's causing it, other than obviously something related to the > slave not being fully shut down yet when we shut down the master. If it seems > important to you, I can continue investigating, but I suspect that its a > quirk of how our test infrastructure interacts with zookeeper, and I don't > think that its likely to come up in practice.
So, I don't see an error in the output that you've shown. When running tests with gdb and a JVM (which you get with ZooKeeper tests) you need to ignore most signals since that's the JVMs normal operation. I did see a a segfault that the JVM propagated in a previous test run, but it's not clear to me why this fix will cover that segfault ... - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10786/#review19724 ----------------------------------------------------------- On April 25, 2013, 8:05 p.m., Thomas Marshall wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10786/ > ----------------------------------------------------------- > > (Updated April 25, 2013, 8:05 p.m.) > > > Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler. > > > Description > ------- > > See summary. > > > This addresses bug MESOS-441. > https://issues.apache.org/jira/browse/MESOS-441 > > > Diffs > ----- > > src/tests/allocator_zookeeper_tests.cpp 2c7deb1 > > Diff: https://reviews.apache.org/r/10786/diff/ > > > Testing > ------- > > bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 > --gtest_filter=*AllocatorZoo* > > > Thanks, > > Thomas Marshall > >
