Hi Kafka Community, I'm looking into the current state of KRaft disaster recovery, specifically for a "majority loss" scenario (e.g., 3 out of 5 controllers are irrecoverable).
When the quorum is stuck with no leader, what’s the standard move to get the cluster back on its feet? I’m looking for the most reliable way to force a surviving node into a functional state to restore business continuity. We have surviving disks and snapshots, and we're okay with some metadata loss if it simplifies the path to a working cluster. Are there specific tools or flags you'd recommend to override the voter state and bootstrap a new leader from the surviving metadata? Appreciate any insights or "field-tested" advice on this. Regards, Abhijeet.
