Hello community,

I came across this blog post:

      https://banzaicloud.com/blog/kafka-on-etcd/

And I thought it would be a good idea to discuss the criticism as a community. 
Let me copy the points here and add some notes:

        • Unlike Kafka it does not have a vibrant and huge community (merge 
those PR’s please, anyone?)
I have personally met and worked with a lot of great people in this community 
over the years, and as such, I probably have a pretty biased view. But, it is a 
common concern that we are not fast enough at responding. We also don't have 
conferences and large meetups compared to other communities. Are those really 
necessary, though? What can we do to be a better community?

        • It uses a protocol which is hard to understand and it’s hard to 
maintain a large Zookeeper cluster
I can't really speak for the hard to understand part, and I don't understand 
what "maintain a large ZooKeeper cluster" is referring to. How large is it and 
why do we need it to be large? We have features like observers that enable 
large clusters, but whether it solves the problem depends on what they are 
after.

        • It’s a bit outdated, compared say with Raft
When we wrote about Zab years back, we had as a goal to explain the protocol in 
a way that could be reproduced. We had other goals too, like explaining how we 
had been successful in implementing a system like ZooKeeper with that protocol, 
the properties it guaranteed and so on. Raft focused on the simplicity of 
understanding, which makes a lot of sense given that there was interest in 
reproducing it. Given its focus, and clearly the quality of the people behind 
it, Raft has been more successful in popularizing the implementation of 
replicated state machines. At a protocol level, however, I don't think there is 
anything that makes Zab outdated with respect to Raft.

        • It’s written in Java (yes, it’s opinionated but this is a problem for 
us as ZK is an infrastructure component)
This is arguable, there are pros and cons both ways.

        • We run everything in Kubernetes and k8s by default has an in-built 
Raft implementation, etcd
I can totally understand this point. No one wants to have to operate two 
systems doing similar things. To consolidate operations, it clearly makes sense 
to pick one. Ironically, this post talks about plugability, but Kubernetes does 
not really give the option of using zk rather than etcd if that's what I want 
to use.  

        • Linearizability (if there is a word like this) - check this 
comparison chart
We do provide linearizable reads with sync(), although I understand that it is 
arguable whether that is truly linearizable. There has been a long running 
discussion about whether we should make sync() truly linearizable by making it 
a first-class txn. Back in the day, we haven't done it because we wanted reads 
to be fast, so we implemented it in a way that it didn't have to go through the 
whole pipeline of request processors, but it still reaches out to the leader. 
See the issue for more detail: 
https://issues.apache.org/jira/browse/ZOOKEEPER-2136

        • Performance and inherent scalability issues
I don't know if those experiments were done using a dedicated device to the txn 
log, which is a well-known fact about zk's performance. Incremental 
snapshotting is clearly a good way to reduce the amount of disk load for 
snapshots, but I wonder whether that's really a primary concern given that 
servers these days often have multiple devices.

I don't understand that max CPU utilization for zk 
(https://coreos.com/blog/performance-of-etcd.html). Perhaps this is something 
to be investigated.

        • Client side complexity and thick clients
Due to the set of features we wanted to offer, we have indeed chosen this path. 

        • Lack of service discovery
I don't have a good sense of how many users are actually bothered by this. I 
have heard complaints over time about service discovery with ZooKeeper, but I'm 
not sure there was any conclusion about whether service discovery is a good use 
case for such coordination systems, including etcd for that matter.

Any feedback?

Thanks,
-Flavio

Reply via email to