[
https://issues.apache.org/jira/browse/MESOS-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044957#comment-14044957
]
Jie Yu commented on MESOS-1546:
-------------------------------
> This solution is fine but it seems to be to be papering over the issue
> instead of fixing the root cause.
I think this is a fine short-term mitigation. The real fix is to support
replicated log reconfiguration, but that's quite complex and needs month to do
that.
> Once we have a whitelist, do we need to use ZK for discovery?
Yes, we still need to use ZK for discovery. In that way, we can still do
swapping a master host or increasing master quorum without the need of shutting
down all of them.
> Is there any other way we can capture an unexpected master, perhaps by
> setting the expected number of masters explicitly?
Setting the number is not enough because we can run into situations where a 4th
(unexpected) master joins the group while one master is down.
> Introduce an optional master whitelist for replicated log based registrar.
> --------------------------------------------------------------------------
>
> Key: MESOS-1546
> URL: https://issues.apache.org/jira/browse/MESOS-1546
> Project: Mesos
> Issue Type: Improvement
> Components: master, replicated log
> Reporter: Jie Yu
>
> When using replicated log as the storage back-end for registrar, we currently
> rely on ZooKeeper to discover replicas (see ZooKeeperNetwork in
> src/log/network.hpp). We simply broadcast Paxos messages to all replicas in
> the ZooKeeperNetwork.
> There is a security concern using this approach. For example, say initially
> there are 3 masters and the quorum size is 2. Now, if a 4th master is
> accidentally added and joined the ZooKeeperNetwork, we will then operate at 4
> replicas with quorum size 2. This could lead to inconsistency in the
> replicated log (and thus registrar).
> The idea here is to introduce a whitelist for masters. We still use
> ZooKeeperNetwork to discover replicas. However, when broadcasting Paxos
> messages in the replicated log, we check the whitelist and make sure we don't
> send Paxos messages to a master that is not in this whitelist.
--
This message was sent by Atlassian JIRA
(v6.2#6252)