[
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572077#comment-14572077
]
Marco Massenzio commented on MESOS-2340:
----------------------------------------
h2. Preliminary testing
In order to test the viability of the approach, I have built three different
versions of Mesos:
1. checkout {{master}}, build (in directory {{build-0.22/}});
2. checkout {{MESOS-2340_zk_json}}, build (in directory {{build-0.23/}}); and
3. same branch, adding changes to {{detector.cpp}}, build (in directory
{{build-0.24/}}).
Then in turn in each directory, I've run the masters with some variations on:
{code}
./bin/mesos-master.sh --ip=127.0.0.1 --port=5055
--zk=zk://localhost:2181/test/mesos \
--quorum=1 --work_dir=/tmp/mesos23.1 &
{code}
changing ports and working directories as appropriate.
h2. Compatibility 0.22-0.23
- if the 0.22 starts first, it succeeds and gets elected; the 0.23 detects it
and WARNS (
line breaks added for ease of legibility):
{noformat}
W0603 19:41:35.631258 24630 detector.cpp:444] Leading master
[email protected]:5050 is using
a deprecated binary format when registering with Zookeper (info): this is
deprecated as of
Mesos 0.23; please update.
{noformat}
- equally, when starting a {{0.23}} first, it becomes leader and detects
(itself) as writing in a deprecated format:
{noformat}
I0603 20:14:19.626807 2966 contender.cpp:247] New candidate (id='13') has
entered the contest for leadership
...
I0603 20:14:19.627610 2968 detector.cpp:138] Detected a new leader: (id='13')
I0603 20:14:19.627759 2965 network.hpp:463] ZooKeeper group PIDs: {
log-replica(1)@127.0.0.1:5050 }
I0603 20:14:19.627869 2972 group.cpp:659] Trying to get
'/test/mesos/info_0000000013' in ZooKeeper
W0603 20:14:19.628669 2970 detector.cpp:444] Leading master
[email protected]:5050 is using a deprecated binary format when registering with
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
{noformat}
- The "old" 0.22 Master (#14) gets into the leadership contest, but finds that
#13 was already leader
and correctly detects it - heading to http://localhost:5051, redirects to 5050
{noformat}
I0603 20:17:14.579074 3068 contender.cpp:247] New candidate (id='14') has
entered the contest for leadership
I0603 20:17:14.579385 3065 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000013' in ZooKeeper
I0603 20:17:14.579419 3057 detector.cpp:452] A new leading master
([email protected]:5050) is detected
I0603 20:17:14.579674 3061 master.cpp:1474] The newly elected leader is
[email protected]:5050 with id 20150603-201419-16777343-5050-2937
{noformat}
h2. Compatibility 0.23-0.24
Finally, I tested a putative "{{0.24}}" Master - one that writes only JSON,
running alongside a {{0.23}}:
- start a 0.23 (#15) first:
{noformat}
I0603 20:24:46.154786 3481 contender.cpp:247] New candidate (id='15') has
entered the contest for leadership
I0603 20:24:46.155068 3486 network.hpp:415] ZooKeeper group memberships changed
I0603 20:24:46.155199 3482 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:24:46.155499 3485 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:24:46.155822 3478 group.cpp:659] Trying to get
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:24:46.156153 3477 network.hpp:463] ZooKeeper group PIDs: {
log-replica(1)@127.0.0.1:5050 }
W0603 20:24:46.156290 3480 detector.cpp:444] Leading master
[email protected]:5050 is using a deprecated binary format when registering with
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:24:46.156337 3480 detector.cpp:476] A new leading master
([email protected]:5050) is detected
I0603 20:24:46.156466 3482 master.cpp:1474] The newly elected leader is
[email protected]:5050 with id 20150603-202446-16777343-5050-3457
I0603 20:24:46.156497 3482 master.cpp:1487] Elected as the leading master!
{noformat}
- then start a "0.24" (#16); it complains, correctly, about the format, but
accepts #15 as the leader:
{noformat}
I0603 20:26:30.410413 3726 group.cpp:313] Group process
(group(4)@127.0.0.1:5051) connected to ZooKeeper
I0603 20:26:30.410434 3726 group.cpp:790] Syncing group operations: queue size
(joins, cancels, datas) = (0, 0, 0)
I0603 20:26:30.410449 3726 group.cpp:385] Trying to create path '/test/mesos'
in ZooKeeper
I0603 20:26:30.418521 3727 network.hpp:415] ZooKeeper group memberships changed
I0603 20:26:30.418741 3729 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:26:30.426359 3725 contender.cpp:247] New candidate (id='16') has
entered the contest for leadership
I0603 20:26:30.426548 3722 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:26:30.426669 3475 network.hpp:415] ZooKeeper group memberships changed
I0603 20:26:30.426764 3485 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:26:30.426789 3729 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:26:30.426946 3727 group.cpp:659] Trying to get
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:26:30.427489 3485 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000015' in ZooKeeper
W0603 20:26:30.427688 3728 detector.cpp:444] Leading master
[email protected]:5050 is using a deprecated binary format when registering with
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:26:30.427810 3728 detector.cpp:476] A new leading master
([email protected]:5050) is detected
{noformat}
- we can also start another 0.24 (#17); same outcome:
{noformat}
I0603 20:29:03.438576 3807 contender.cpp:247] New candidate (id='17') has
entered the contest for leadership
I0603 20:29:03.438606 3474 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:29:03.438629 3722 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:29:03.438904 3807 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:29:03.439270 3807 group.cpp:659] Trying to get
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:29:03.439407 3806 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.439627 3474 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.439906 3722 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.440120 3474 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000016' in ZooKeeper
I0603 20:29:03.440156 3806 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000016' in ZooKeeper
W0603 20:29:03.440443 3809 detector.cpp:444] Leading master
[email protected]:5050 is using a deprecated binary format when registering with
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:29:03.440495 3722 group.cpp:659] Trying to get
'/test/mesos/log_replicas/0000000016' in ZooKeeper
I0603 20:29:03.440541 3809 detector.cpp:476] A new leading master
([email protected]:5050) is detected
{noformat}
(see *Conclusions* below for some considerations about format compatibility).
- If we now kill the Leader (the 0.23 version), the new Leader gets elected and
all looks fine.
h3. Starting a 0.24 first
With two 0.24 servers running, we start a 0.23:
{noformat}
I0603 20:42:37.767892 4134 contender.cpp:247] New candidate (id='18') has
entered the contest for leadership
...
I0603 20:42:37.768111 4126 detector.cpp:138] Detected a new leader: (id='16')
...
I0603 20:42:37.769239 4124 detector.cpp:449] Detected a JSON MasterInfo data;
this is the new format as of Mesos 0.23
I0603 20:42:37.769870 4124 detector.cpp:476] A new leading master
([email protected]:5051) is detected
I0603 20:42:37.769944 3726 network.hpp:463] ZooKeeper group PIDs: {
log-replica(1)@127.0.0.1:5051, log-replica(1)@127.0.0.1:5052,
log-replica(1)@127.0.0.1:5055 }
I0603 20:42:37.770009 4132 master.cpp:1474] The newly elected leader is
[email protected]:5051 with id 20150603-202630-16777343-5051-3705
{noformat}
it correctly detects the new format, finds the (existing) leader (incidentally,
I note how the log
entry, citing a "newly elected leader" is grossly misleading) and happily
carries on his merry ways.
Obviously, heading to {{localhost:5055}} redirects to {{localhost:5051}}; and,
if the
latter gets killed, the new Leader (still a 0.24 one) gets elected and
correctly detected.
h2. Conclusions
This is an *extremely* limited set of tests, but it would seem to indicate that
this
approach could be made to work.
As a matter for further discussion, please note that we now have a *mixed
format* data in JSON
(this seems to work, but should we "force" a binary format on 0.24 servers if
they detect
a deprecated format? eg, using a {{--force-zk-compatibility}} flag?):
{noformat}
[zk: localhost:2181(CONNECTED) 33] ls /test/mesos
[info_0000000015, log_replicas, json.info_0000000016, json.info_0000000017]
{noformat}
> Publish JSON in ZK instead of serialized MasterInfo
> ---------------------------------------------------
>
> Key: MESOS-2340
> URL: https://issues.apache.org/jira/browse/MESOS-2340
> Project: Mesos
> Issue Type: Improvement
> Components: leader election
> Reporter: Zameer Manji
> Assignee: Marco Massenzio
>
> Currently to discover the master a client needs the ZK node location and
> access to the MasterInfo protobuf so it can deserialize the binary blob in
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so
> clients are not tied to protobuf to do service discovery.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)