Re: Review Request: Fix the flaky AllocatorZookeeperTests

Thomas Marshall Thu, 25 Apr 2013 14:07:14 -0700


> On April 25, 2013, 8:07 p.m., Benjamin Hindman wrote:
> > Can you elaborate on why AtMost(1) was not sufficient?


I honestly have no idea, although I just realized that the JIRA issue I linked 
to is actually referring to a different problem. That issue (timing out on 
waiting for the registered future) is fixed by the recent change that extended 
the amount of time that we wait on futures. The problem being fixed by this 
patch looks something more like:

...
I0425 13:37:03.516690 30336 hierarchical_allocator_process.hpp:423] Removed 
slave 201304251337-16842879-37747-30328-0
I0425 13:37:03.516121 30334 slave.cpp:486] Slave asked to shut down by 
[email protected]:37747
I0425 13:37:03.517014 30334 slave.cpp:1099] Asked to shut down framework 
201304251337-16842879-37747-30328-0000 by [email protected]:37747
W0425 13:37:03.517163 30334 slave.cpp:1120] Ignoring shutdown framework 
201304251337-16842879-37747-30328-0000 because it is terminating
I0425 13:37:03.518982 30334 slave.cpp:1867] [email protected]:37747 exited
[Thread 0x7fffb0cff700 (LWP 30378) exited]
W0425 13:37:03.519126 30334 slave.cpp:1870] Master disconnected! Waiting for a 
new master to be elected
[Thread 0x7fffabfff700 (LWP 30379) exited]
I0425 13:37:03.521286 30328 slave.cpp:441] Slave terminating
I0425 13:37:03.521467 30328 slave.cpp:1099] Asked to shut down framework 
201304251337-16842879-37747-30328-0000 by @0.0.0.0:0
W0425 13:37:03.521649 30328 slave.cpp:1120] Ignoring shutdown framework 
201304251337-16842879-37747-30328-0000 because it is terminating
[Thread 0x7fffb1d01700 (LWP 30374) exited]
[Thread 0x7fffab7fe700 (LWP 30376) exited]

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7fffb230e700 (LWP 30370)]
0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0  0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fffb2316ab2 in Java_sun_nio_ch_FileDispatcherImpl_write0 () from 
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/libnio.so
#2  0x00007fffcc39ef90 in ?? ()
#3  0x0000000000000000 in ?? ()

That error is occurring somewhere deep down inside zookeeper, and I don't 
really know what's causing it, other than obviously something related to the 
slave not being fully shut down yet when we shut down the master. If it seems 
important to you, I can continue investigating, but I suspect that its a quirk 
of how our test infrastructure interacts with zookeeper, and I don't think that 
its likely to come up in practice.


- Thomas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10786/#review19724
-----------------------------------------------------------


On April 25, 2013, 8:05 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10786/
> -----------------------------------------------------------
> 
> (Updated April 25, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> This addresses bug MESOS-441.
>     https://issues.apache.org/jira/browse/MESOS-441
> 
> 
> Diffs
> -----
> 
>   src/tests/allocator_zookeeper_tests.cpp 2c7deb1 
> 
> Diff: https://reviews.apache.org/r/10786/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 
> --gtest_filter=*AllocatorZoo*
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Re: Review Request: Fix the flaky AllocatorZookeeperTests

Reply via email to