Re: Review Request: Fix the flaky AllocatorZookeeperTests

Benjamin Hindman Thu, 25 Apr 2013 19:20:48 -0700


> On April 25, 2013, 8:07 p.m., Benjamin Hindman wrote:
> > Can you elaborate on why AtMost(1) was not sufficient?
> 
> Thomas Marshall wrote:
>     I honestly have no idea, although I just realized that the JIRA issue I 
> linked to is actually referring to a different problem. That issue (timing 
> out on waiting for the registered future) is fixed by the recent change that 
> extended the amount of time that we wait on futures. The problem being fixed 
> by this patch looks something more like:
>     
>     ...
>     I0425 13:37:03.516690 30336 hierarchical_allocator_process.hpp:423] 
> Removed slave 201304251337-16842879-37747-30328-0
>     I0425 13:37:03.516121 30334 slave.cpp:486] Slave asked to shut down by 
> [email protected]:37747
>     I0425 13:37:03.517014 30334 slave.cpp:1099] Asked to shut down framework 
> 201304251337-16842879-37747-30328-0000 by [email protected]:37747
>     W0425 13:37:03.517163 30334 slave.cpp:1120] Ignoring shutdown framework 
> 201304251337-16842879-37747-30328-0000 because it is terminating
>     I0425 13:37:03.518982 30334 slave.cpp:1867] [email protected]:37747 exited
>     [Thread 0x7fffb0cff700 (LWP 30378) exited]
>     W0425 13:37:03.519126 30334 slave.cpp:1870] Master disconnected! Waiting 
> for a new master to be elected
>     [Thread 0x7fffabfff700 (LWP 30379) exited]
>     I0425 13:37:03.521286 30328 slave.cpp:441] Slave terminating
>     I0425 13:37:03.521467 30328 slave.cpp:1099] Asked to shut down framework 
> 201304251337-16842879-37747-30328-0000 by @0.0.0.0:0
>     W0425 13:37:03.521649 30328 slave.cpp:1120] Ignoring shutdown framework 
> 201304251337-16842879-37747-30328-0000 because it is terminating
>     [Thread 0x7fffb1d01700 (LWP 30374) exited]
>     [Thread 0x7fffab7fe700 (LWP 30376) exited]
>     
>     Program received signal SIGPIPE, Broken pipe.
>     [Switching to Thread 0x7fffb230e700 (LWP 30370)]
>     0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
>     (gdb) where
>     #0  0x00007ffff6724ccd in write () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
>     #1  0x00007fffb2316ab2 in Java_sun_nio_ch_FileDispatcherImpl_write0 () 
> from /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/libnio.so
>     #2  0x00007fffcc39ef90 in ?? ()
>     #3  0x0000000000000000 in ?? ()
>     
>     That error is occurring somewhere deep down inside zookeeper, and I don't 
> really know what's causing it, other than obviously something related to the 
> slave not being fully shut down yet when we shut down the master. If it seems 
> important to you, I can continue investigating, but I suspect that its a 
> quirk of how our test infrastructure interacts with zookeeper, and I don't 
> think that its likely to come up in practice.


So, I don't see an error in the output that you've shown. When running tests 
with gdb and a JVM (which you get with ZooKeeper tests) you need to ignore most 
signals since that's the JVMs normal operation. I did see a a segfault that the 
JVM propagated in a previous test run, but it's not clear to me why this fix 
will cover that segfault ...


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10786/#review19724
-----------------------------------------------------------


On April 25, 2013, 8:05 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10786/
> -----------------------------------------------------------
> 
> (Updated April 25, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> This addresses bug MESOS-441.
>     https://issues.apache.org/jira/browse/MESOS-441
> 
> 
> Diffs
> -----
> 
>   src/tests/allocator_zookeeper_tests.cpp 2c7deb1 
> 
> Diff: https://reviews.apache.org/r/10786/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 
> --gtest_filter=*AllocatorZoo*
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Re: Review Request: Fix the flaky AllocatorZookeeperTests

Reply via email to