Hey folks

We recently came across a bug [1] which was very hard to detect during
testing and easy to introduce during development. I would like to kick
start a discussion on potential ways which could avoid this category of
bugs in Apache Kafka.

I think we might want to start working towards a "debug" mode in the broker
which will enable assertions for different invariants in Kafka. Invariants
could be derived from formal verification that Jack [2] and others have
shared with the community earlier AND from tribal knowledge in the
community such as network threads should not perform any storage IO, files
should not fsync in critical product path, metric gauges should not acquire
a lock etc. The release qualification  process (system tests + integration
tests) will run the broker in "debug" mode and will validate these
assertions while testing the system in different scenarios. The inspiration
for this idea is derived from Marc Brooker's post at
https://brooker.co.za/blog/2023/07/28/ds-testing.html

Your thoughts on this topic are welcome! Also, please feel free to take
this idea forward and draft a KIP for a more formal discussion.

[1] https://issues.apache.org/jira/browse/KAFKA-15653
[2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j

--
Divij Vaidya

Reply via email to