Hi Igor, It is common for databases, filesystems, and other similar programs to require a formatting step before they are used. For example, postgres requires you to run initdb. Linux requires you to run mkfs before using a filesystem. Windows requires you to run "format c:/", or something equivalent. Ceph requires you to run the ceph-deploy tool or a similar tool. It's really not a high operational burden because it only has to be done once when the system is initialized.
With a clearly defined initialization step, you can clearly distinguish disk problems from simply the first startup of a cluster. This is actually quite important to the correctness of the system. For example, if I start up two out of three Raft nodes and their disks erroneously show up as blank, I could elect a leader with an empty log. In that case, I've silently lost all the metadata in the system. In general, there is a bootstrapping problem where brokers may not be able to connect to the controller quorum without first having some local metadata. For example, if you are managing users using SCRAM, the SCRAM principal for the broker needs to exist before the connection can be made. We call this "bootstrapping" because it requires you to "lift yourself up by your own bootstraps." You need the metadata to fetch the metadata. The explicit initialization step breaks the cycle and allows the cluster to be successfully created. I agree that in testing, it is nice not to have to run a separate command. To facilitate this, we could have a bash script that allows developers to start up a single node cluster without running kafka-storage.sh. That might be helpful. I suppose a docker image is another way to do it, which might also help people test. best, Colin On Mon, Nov 29, 2021, at 12:20, Igor Soarez wrote: > Hi all, > > Bumping this thread as it’s been a while. > > Looking forward to any kind of feedback, pease take a look. > > I created a short PR with a possible implementation - > https://github.com/apache/kafka/pull/11549 > > -- > Igor > > > >> On 18 Oct 2021, at 15:11, Igor Soarez <soa...@apple.com.INVALID> wrote: >> >> Hi all, >> >> I'd like to propose that we simplify the operation of KRaft servers a bit by >> removing the requirement to run kafka-storage.sh for new storage directories. >> >> Please take a look at the KIP and provide your feedback: >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-785%3A+Automatic+storage+formatting >> >> -- >> Igor >>