While perusing the replicated log code for an upcoming talk I realized we could drop the use of ZooKeeper for leader election in favor of directly using the Mesos replicated log. The reason for this is that we are required to be the leader to write anything to the log. So the ZK-based leader election operation would be replaced with an attempt to write a noop to the replicated log as in [1]. This would remove one of the 3 places we use ZK, the others being discovery of log replicas (via libmesos, and this is already optional) and announcement to ZooKeeper for service discovery by finagle clients. This could also be made optional, as some operators use alternative load balancers.
Changing the leader election logic to use the replicated log is not incongruent with other efforts to replace the hand-rolled LogStorage with a real SQL database - the leader election function could be implemented with a native database function such as [2] or [3] in any serious candidate database. [1] https://github.com/apache/aurora/blob/827b9abea48babe53ad5b2c521757c60f04c6dfc/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java#L233-L241 [2] http://www.postgresql.org/docs/9.1/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS [3] http://dev.mysql.com/doc/refman/5.5/en/miscellaneous-functions.html#function_get-lock
