> On April 25, 2013, 8:07 p.m., Benjamin Hindman wrote:
> > Can you elaborate on why AtMost(1) was not sufficient?
> 
> Thomas Marshall wrote:
>     I honestly have no idea, although I just realized that the JIRA issue I 
> linked to is actually referring to a different problem. That issue (timing 
> out on waiting for the registered future) is fixed by the recent change that 
> extended the amount of time that we wait on futures. The problem being fixed 
> by this patch looks something more like:
>     
>     ...
>     I0425 13:37:03.516690 30336 hierarchical_allocator_process.hpp:423] 
> Removed slave 201304251337-16842879-37747-30328-0
>     I0425 13:37:03.516121 30334 slave.cpp:486] Slave asked to shut down by 
> [email protected]:37747
>     I0425 13:37:03.517014 30334 slave.cpp:1099] Asked to shut down framework 
> 201304251337-16842879-37747-30328-0000 by [email protected]:37747
>     W0425 13:37:03.517163 30334 slave.cpp:1120] Ignoring shutdown framework 
> 201304251337-16842879-37747-30328-0000 because it is terminating
>     I0425 13:37:03.518982 30334 slave.cpp:1867] [email protected]:37747 exited
>     [Thread 0x7fffb0cff700 (LWP 30378) exited]
>     W0425 13:37:03.519126 30334 slave.cpp:1870] Master disconnected! Waiting 
> for a new master to be elected
>     [Thread 0x7fffabfff700 (LWP 30379) exited]
>     I0425 13:37:03.521286 30328 slave.cpp:441] Slave terminating
>     I0425 13:37:03.521467 30328 slave.cpp:1099] Asked to shut down framework 
> 201304251337-16842879-37747-30328-0000 by @0.0.0.0:0
>     W0425 13:37:03.521649 30328 slave.cpp:1120] Ignoring shutdown framework 
> 201304251337-16842879-37747-30328-0000 because it is terminating
>     [Thread 0x7fffb1d01700 (LWP 30374) exited]
>     [Thread 0x7fffab7fe700 (LWP 30376) exited]
>     
>     Program received signal SIGPIPE, Broken pipe.
>     [Switching to Thread 0x7fffb230e700 (LWP 30370)]
>     0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
>     (gdb) where
>     #0  0x00007ffff6724ccd in write () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
>     #1  0x00007fffb2316ab2 in Java_sun_nio_ch_FileDispatcherImpl_write0 () 
> from /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/libnio.so
>     #2  0x00007fffcc39ef90 in ?? ()
>     #3  0x0000000000000000 in ?? ()
>     
>     That error is occurring somewhere deep down inside zookeeper, and I don't 
> really know what's causing it, other than obviously something related to the 
> slave not being fully shut down yet when we shut down the master. If it seems 
> important to you, I can continue investigating, but I suspect that its a 
> quirk of how our test infrastructure interacts with zookeeper, and I don't 
> think that its likely to come up in practice.
> 
> Benjamin Hindman wrote:
>     So, I don't see an error in the output that you've shown. When running 
> tests with gdb and a JVM (which you get with ZooKeeper tests) you need to 
> ignore most signals since that's the JVMs normal operation. I did see a a 
> segfault that the JVM propagated in a previous test run, but it's not clear 
> to me why this fix will cover that segfault ...
> 
> Thomas Marshall wrote:
>     Sorry, I didn't realize that was expected behavior. In that case, the 
> problem this patch is solving looks something more like:
>     
>     ...
>     I0426 10:29:39.232862 12073 master.cpp:774] Asked to unregister framework 
> 201304261029-16842879-56235-12059-0000
>     I0426 10:29:39.233058 12073 master.hpp:300] Removing task with resources 
> cpus=1; mem=500 on slave 201304261029-16842879-56235-12059-0
>     I0426 10:29:39.233082 12079 hierarchical_allocator_process.hpp:359] 
> Deactivated framework 201304261029-16842879-56235-12059-0000
>     I0426 10:29:39.233254 12073 master.hpp:318] Removing offer with resources 
> cpus=1; mem=524; ports=[31000-32000]; disk=9053 on slave 
> 201304261029-16842879-56235-12059-0
>     I0426 10:29:39.233487 12079 hierarchical_allocator_process.hpp:544] 
> Recovered cpus=1; mem=500 (total allocatable: cpus=1; mem=500; ports=[]; 
> disk=0) on slave 201304261029-16842879-56235-12059-0 from framework 
> 201304261029-16842879-56235-12059-0000
>     I0426 10:29:39.233636 12073 master.cpp:477] Master terminating
>     I0426 10:29:39.233777 12059 master.cpp:283] Shutting down master
>     I0426 10:29:39.233770 12079 hierarchical_allocator_process.hpp:544] 
> Recovered cpus=1; mem=524; ports=[31000-32000]; disk=9053 (total allocatable: 
> cpus=2; mem=1024; ports=[31000-32000]; disk=9053) on slave 
> 201304261029-16842879-56235-12059-0 from framework 
> 201304261029-16842879-56235-12059-0000
>     I0426 10:29:39.233077 12074 slave.cpp:1099] Asked to shut down framework 
> 201304261029-16842879-56235-12059-0000 by [email protected]:56235
>     W0426 10:29:39.234302 12074 slave.cpp:1120] Ignoring shutdown framework 
> 201304261029-16842879-56235-12059-0000 because it is terminating
>     I0426 10:29:39.234310 12078 hierarchical_allocator_process.hpp:423] 
> Removed slave 201304261029-16842879-56235-12059-0
>     I0426 10:29:39.234422 12074 slave.cpp:486] Slave asked to shut down by 
> [email protected]:56235
>     I0426 10:29:39.234792 12074 slave.cpp:1099] Asked to shut down framework 
> 201304261029-16842879-56235-12059-0000 by [email protected]:56235
>     W0426 10:29:39.234848 12074 slave.cpp:1120] Ignoring shutdown framework 
> 201304261029-16842879-56235-12059-0000 because it is terminating
>     I0426 10:29:39.234915 12074 slave.cpp:441] Slave terminating
>     I0426 10:29:39.234956 12074 slave.cpp:1099] Asked to shut down framework 
> 201304261029-16842879-56235-12059-0000 by @0.0.0.0:0
>     W0426 10:29:39.234998 12074 slave.cpp:1120] Ignoring shutdown framework 
> 201304261029-16842879-56235-12059-0000 because it is terminating
>     pure virtual method called
>     terminate called without an active exception
>     Aborted (core dumped)

i've seen this "pure virtual method" error before, but unsure of the cause. do 
you know why/how your patch fixes this?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10786/#review19724
-----------------------------------------------------------


On April 25, 2013, 8:05 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10786/
> -----------------------------------------------------------
> 
> (Updated April 25, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> This addresses bug MESOS-441.
>     https://issues.apache.org/jira/browse/MESOS-441
> 
> 
> Diffs
> -----
> 
>   src/tests/allocator_zookeeper_tests.cpp 2c7deb1 
> 
> Diff: https://reviews.apache.org/r/10786/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 
> --gtest_filter=*AllocatorZoo*
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Reply via email to