[ 
https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572077#comment-14572077
 ] 

Marco Massenzio commented on MESOS-2340:
----------------------------------------

h2. Preliminary testing

In order to test the viability of the approach, I have built three different 
versions of Mesos:

1. checkout {{master}}, build (in directory {{build-0.22/}});
2. checkout {{MESOS-2340_zk_json}}, build (in directory {{build-0.23/}}); and
3. same branch, adding changes to {{detector.cpp}}, build (in directory 
{{build-0.24/}}).

Then in turn in each directory, I've run the masters with some variations on:

{code}
./bin/mesos-master.sh --ip=127.0.0.1 --port=5055 
--zk=zk://localhost:2181/test/mesos \
    --quorum=1 --work_dir=/tmp/mesos23.1 &
{code}

changing ports and working directories as appropriate.

h2. Compatibility 0.22-0.23

- if the 0.22 starts first, it succeeds and gets elected; the 0.23 detects it 
and WARNS (
line breaks added for ease of legibility):

{noformat}
W0603 19:41:35.631258 24630 detector.cpp:444] Leading master 
[email protected]:5050 is using 
   a deprecated binary format when registering with Zookeper (info): this is 
deprecated as of 
   Mesos 0.23; please update.
{noformat}

- equally, when starting a {{0.23}} first, it becomes leader and detects 
(itself) as writing in a deprecated format:

{noformat}
I0603 20:14:19.626807  2966 contender.cpp:247] New candidate (id='13') has 
entered the contest for leadership
...
I0603 20:14:19.627610  2968 detector.cpp:138] Detected a new leader: (id='13')
I0603 20:14:19.627759  2965 network.hpp:463] ZooKeeper group PIDs: { 
log-replica(1)@127.0.0.1:5050 }
I0603 20:14:19.627869  2972 group.cpp:659] Trying to get 
'/test/mesos/info_0000000013' in ZooKeeper
W0603 20:14:19.628669  2970 detector.cpp:444] Leading master 
[email protected]:5050 is using a deprecated binary format when registering with 
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
{noformat}

- The "old" 0.22 Master (#14) gets into the leadership contest, but finds that 
#13 was already leader
  and correctly detects it - heading to http://localhost:5051, redirects to 5050

{noformat}
I0603 20:17:14.579074  3068 contender.cpp:247] New candidate (id='14') has 
entered the contest for leadership
I0603 20:17:14.579385  3065 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000013' in ZooKeeper
I0603 20:17:14.579419  3057 detector.cpp:452] A new leading master 
([email protected]:5050) is detected
I0603 20:17:14.579674  3061 master.cpp:1474] The newly elected leader is 
[email protected]:5050 with id 20150603-201419-16777343-5050-2937
{noformat}

h2. Compatibility 0.23-0.24

Finally, I tested a putative "{{0.24}}" Master - one that writes only JSON, 
running alongside a {{0.23}}:

- start a 0.23 (#15) first:

{noformat}
I0603 20:24:46.154786  3481 contender.cpp:247] New candidate (id='15') has 
entered the contest for leadership
I0603 20:24:46.155068  3486 network.hpp:415] ZooKeeper group memberships changed
I0603 20:24:46.155199  3482 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:24:46.155499  3485 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:24:46.155822  3478 group.cpp:659] Trying to get 
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:24:46.156153  3477 network.hpp:463] ZooKeeper group PIDs: { 
log-replica(1)@127.0.0.1:5050 }
W0603 20:24:46.156290  3480 detector.cpp:444] Leading master 
[email protected]:5050 is using a deprecated binary format when registering with 
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:24:46.156337  3480 detector.cpp:476] A new leading master 
([email protected]:5050) is detected
I0603 20:24:46.156466  3482 master.cpp:1474] The newly elected leader is 
[email protected]:5050 with id 20150603-202446-16777343-5050-3457
I0603 20:24:46.156497  3482 master.cpp:1487] Elected as the leading master!
{noformat}

- then start a "0.24" (#16); it complains, correctly, about the format, but 
accepts #15 as the leader:

{noformat}
I0603 20:26:30.410413  3726 group.cpp:313] Group process 
(group(4)@127.0.0.1:5051) connected to ZooKeeper
I0603 20:26:30.410434  3726 group.cpp:790] Syncing group operations: queue size 
(joins, cancels, datas) = (0, 0, 0)
I0603 20:26:30.410449  3726 group.cpp:385] Trying to create path '/test/mesos' 
in ZooKeeper
I0603 20:26:30.418521  3727 network.hpp:415] ZooKeeper group memberships changed
I0603 20:26:30.418741  3729 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:26:30.426359  3725 contender.cpp:247] New candidate (id='16') has 
entered the contest for leadership
I0603 20:26:30.426548  3722 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:26:30.426669  3475 network.hpp:415] ZooKeeper group memberships changed
I0603 20:26:30.426764  3485 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:26:30.426789  3729 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:26:30.426946  3727 group.cpp:659] Trying to get 
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:26:30.427489  3485 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000015' in ZooKeeper
W0603 20:26:30.427688  3728 detector.cpp:444] Leading master 
[email protected]:5050 is using a deprecated binary format when registering with 
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:26:30.427810  3728 detector.cpp:476] A new leading master 
([email protected]:5050) is detected
{noformat}

- we can also start another 0.24 (#17); same outcome:

{noformat}
I0603 20:29:03.438576  3807 contender.cpp:247] New candidate (id='17') has 
entered the contest for leadership
I0603 20:29:03.438606  3474 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:29:03.438629  3722 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000014' in ZooKeeper
I0603 20:29:03.438904  3807 detector.cpp:138] Detected a new leader: (id='15')
I0603 20:29:03.439270  3807 group.cpp:659] Trying to get 
'/test/mesos/info_0000000015' in ZooKeeper
I0603 20:29:03.439407  3806 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.439627  3474 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.439906  3722 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000015' in ZooKeeper
I0603 20:29:03.440120  3474 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000016' in ZooKeeper
I0603 20:29:03.440156  3806 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000016' in ZooKeeper
W0603 20:29:03.440443  3809 detector.cpp:444] Leading master 
[email protected]:5050 is using a deprecated binary format when registering with 
Zookeper (info): this is deprecated as of Mesos 0.23; please update.
I0603 20:29:03.440495  3722 group.cpp:659] Trying to get 
'/test/mesos/log_replicas/0000000016' in ZooKeeper
I0603 20:29:03.440541  3809 detector.cpp:476] A new leading master 
([email protected]:5050) is detected
{noformat}

(see *Conclusions* below for some considerations about format compatibility).

- If we now kill the Leader (the 0.23 version), the new Leader gets elected and 
all looks fine.

h3. Starting a 0.24 first

With two 0.24 servers running, we start a 0.23:

{noformat}
I0603 20:42:37.767892  4134 contender.cpp:247] New candidate (id='18') has 
entered the contest for leadership
...
I0603 20:42:37.768111  4126 detector.cpp:138] Detected a new leader: (id='16')
...
I0603 20:42:37.769239  4124 detector.cpp:449] Detected a JSON MasterInfo data; 
this is the new format as of Mesos 0.23

I0603 20:42:37.769870  4124 detector.cpp:476] A new leading master 
([email protected]:5051) is detected
I0603 20:42:37.769944  3726 network.hpp:463] ZooKeeper group PIDs: { 
log-replica(1)@127.0.0.1:5051, log-replica(1)@127.0.0.1:5052, 
log-replica(1)@127.0.0.1:5055 }
I0603 20:42:37.770009  4132 master.cpp:1474] The newly elected leader is 
[email protected]:5051 with id 20150603-202630-16777343-5051-3705
{noformat}

it correctly detects the new format, finds the (existing) leader (incidentally, 
I note how the log
entry, citing a "newly elected leader" is grossly misleading) and happily 
carries on his merry ways.

Obviously, heading to {{localhost:5055}} redirects to {{localhost:5051}}; and, 
if the 
latter gets killed, the new Leader (still a 0.24 one) gets elected and 
correctly detected.

h2. Conclusions

This is an *extremely* limited set of tests, but it would seem to indicate that 
this
approach could be made to work.

As a matter for further discussion, please note that we now have a *mixed 
format* data in JSON
(this seems to work, but should we "force" a binary format on 0.24 servers if 
they detect
a deprecated format? eg, using a {{--force-zk-compatibility}} flag?):

{noformat}
[zk: localhost:2181(CONNECTED) 33] ls /test/mesos
[info_0000000015, log_replicas, json.info_0000000016, json.info_0000000017]
{noformat}


> Publish JSON in ZK instead of serialized MasterInfo
> ---------------------------------------------------
>
>                 Key: MESOS-2340
>                 URL: https://issues.apache.org/jira/browse/MESOS-2340
>             Project: Mesos
>          Issue Type: Improvement
>          Components: leader election
>            Reporter: Zameer Manji
>            Assignee: Marco Massenzio
>
> Currently to discover the master a client needs the ZK node location and 
> access to the MasterInfo protobuf so it can deserialize the binary blob in 
> the node.
> I think it would be nice to publish JSON (like Twitter's ServerSets) so 
> clients are not tied to protobuf to do service discovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to