[
https://issues.apache.org/jira/browse/MESOS-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932704#comment-13932704
]
Yan Xu commented on MESOS-1088:
-------------------------------
I suspect that it's the problem lies in the
[latch|https://github.com/apache/mesos/blob/ea1ce107bb2aadc947563f1b59c7d08d1b7125f3/3rdparty/libprocess/src/latch.cpp].
{code:title=latch.cpp}
void Latch::trigger()
{
if (!triggered) {
terminate(pid);
{code}
It's possible for {{process::wait(pid, duration)}} below to return and in turn,
{{Latch::await(...)}} to return {{false}} before the execution the next line,
right?
{code:title=latch.cpp (continued)}
triggered = true;
}
}
bool Latch::await(const Duration& duration)
{
if (!triggered) {
process::wait(pid, duration); // Explict to disambiguate.
// It's possible that we failed to wait because:
// (1) Our process has already terminated.
// (2) We timed out (i.e., duration was not "infinite").
// In the event of (1) we might need to return 'true' since a
// terminated process might imply that the latch has been
// triggered. To capture this we simply return the value of
// 'triggered' (which will also capture cases where we actually
// timed out but have since triggered, which seems like an
// acceptable semantics given such a "tie").
return triggered;
}
return true;
}
{code}
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
> is flaky
> -----------------------------------------------------------------------------------------
>
> Key: MESOS-1088
> URL: https://issues.apache.org/jira/browse/MESOS-1088
> Project: Mesos
> Issue Type: Bug
> Components: test
> Reporter: Yan Xu
> Assignee: Yan Xu
> Fix For: 0.19.0
>
>
> {code}
> [ RUN ]
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
> I0312 15:50:02.733414 2029 zookeeper_test_server.cpp:158] Started
> ZooKeeperTestServer on port 32925
> 2014-03-12 15:50:02,733:2029(0x7fc285609700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
> 2014-03-12 15:50:02,733:2029(0x7fc285609700):ZOO_INFO@log_env@716: Client
> environment:host.name=fedora-20
> 2014-03-12 15:50:02,734:2029(0x7fc285609700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
> 2014-03-12 15:50:02,734:2029(0x7fc285609700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.13.6-200.fc20.x86_64
> 2014-03-12 15:50:02,734:2029(0x7fc285609700):ZOO_INFO@log_env@725: Client
> environment:os.version=#1 SMP Fri Mar 7 17:02:28 UTC 2014
> 2014-03-12 15:50:02,734:2029(0x7fc285609700):ZOO_INFO@log_env@733: Client
> environment:user.name=jenkins
> 2014-03-12 15:50:02,735:2029(0x7fc285609700):ZOO_INFO@log_env@741: Client
> environment:user.home=/home/jenkins
> 2014-03-12 15:50:02,735:2029(0x7fc285609700):ZOO_INFO@log_env@753: Client
> environment:user.dir=/var/jenkins/workspace/vinod-test/compiler/clang/os/fedora-20/src
> 2014-03-12 15:50:02,735:2029(0x7fc285609700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=127.0.0.1:32925 sessionTimeout=10000
> watcher=0x7fc28df599f0 sessionId=0 sessionPasswd=<null>
> context=0x7fc264019490 flags=0
> I0312 15:50:02.738956 2050 contender.cpp:127] Joining the ZK group
> 2014-03-12 15:50:02,743:2029(0x7fc2532d1700):ZOO_INFO@check_events@1703:
> initiated connection to server [127.0.0.1:32925]
> 2014-03-12 15:50:02,750:2029(0x7fc2532d1700):ZOO_INFO@check_events@1750:
> session establishment complete on server [127.0.0.1:32925],
> sessionId=0x144b87cfc6c0000, negotiated timeout=10000
> I0312 15:50:02.752624 2051 group.cpp:310] Group process
> ((1177)@192.168.122.164:46605) connected to ZooKeeper
> I0312 15:50:02.752657 2051 group.cpp:778] Syncing group operations: queue
> size (joins, cancels, datas) = (1, 0, 0)
> I0312 15:50:02.752666 2051 group.cpp:382] Trying to create path '/mesos' in
> ZooKeeper
> I0312 15:50:02.770174 2052 contender.cpp:243] New candidate (id='0') has
> entered the contest for leadership
> I0312 15:50:02.773874 2051 detector.cpp:134] Detected a new leader: (id='0')
> I0312 15:50:02.774001 2051 group.cpp:655] Trying to get
> '/mesos/info_0000000000' in ZooKeeper
> I0312 15:50:02.778889 2051 detector.cpp:377] A new leading master
> ([email protected]:10000) is detected
> tests/master_contender_detector_tests.cpp:738: Failure
> Failed to wait 10secs for detected
> I0312 15:50:02.779384 2029 contender.cpp:182] Now cancelling the membership: > 0
> 2014-03-12 15:50:02,780:2029(0x7fc28f738880):ZOO_INFO@zookeeper_close@2505:
> Closing zookeeper sessionId=0x144b87cfc6c0000 to [127.0.0.1:32925]
> I0312 15:50:02.784046 2029 zookeeper_test_server.cpp:122] Shutdown
> ZooKeeperTestServer on port 32925
> [ FAILED ]
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
> (55 ms)
> {code}
> Notice that only 55ms has elapsed for this test and the Clock is not paused.
--
This message was sent by Atlassian JIRA
(v6.2#6252)