Emanuele Cesena created KAFKA-4666:
--------------------------------------
Summary: Failure test for Kafka configured for consistency vs
availability
Key: KAFKA-4666
URL: https://issues.apache.org/jira/browse/KAFKA-4666
Project: Kafka
Issue Type: Improvement
Reporter: Emanuele Cesena
Attachments: consistency_test.py
We recently had an issue with our Kafka setup because of a misconfiguration.
In short, we thought we have configured Kafka for durability, but we didn't set
the producers to acks=all. During a full outage, we had situations where some
partitions were "partitioned", meaning that the followers started without
properly waiting for the right leader, and thus we lost data. Again, this is
not an issue with Kafka, but a misconfiguration on our side.
I think we reproduced the issue, and we built a docker test that proves that,
if the producer isn't set with acks=all, then data can be lost during an almost
full outage. The test is attached.
I was thinking to send a PR, but wanted to run this through you first, as it's
not necessarily proving that a feature works as expected.
In addition, I think the documentation could be slightly improved, for instance
in the section:
http://kafka.apache.org/documentation/#design_ha
by clearly stating that there are 3 steps one should do for configuring kafka
for consistency, the third being that producers should be set with acks=all
(which is now part of the 2nd point).
Please let me know what do you think, and I can send a PR if you agree.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)