Hey folks We recently came across a bug [1] which was very hard to detect during testing and easy to introduce during development. I would like to kick start a discussion on potential ways which could avoid this category of bugs in Apache Kafka.
I think we might want to start working towards a "debug" mode in the broker which will enable assertions for different invariants in Kafka. Invariants could be derived from formal verification that Jack [2] and others have shared with the community earlier AND from tribal knowledge in the community such as network threads should not perform any storage IO, files should not fsync in critical product path, metric gauges should not acquire a lock etc. The release qualification process (system tests + integration tests) will run the broker in "debug" mode and will validate these assertions while testing the system in different scenarios. The inspiration for this idea is derived from Marc Brooker's post at https://brooker.co.za/blog/2023/07/28/ds-testing.html Your thoughts on this topic are welcome! Also, please feel free to take this idea forward and draft a KIP for a more formal discussion. [1] https://issues.apache.org/jira/browse/KAFKA-15653 [2] https://lists.apache.org/thread/pfrkk0yb394l5qp8h5mv9vwthx15084j -- Divij Vaidya