Hi Igor,

It is common for databases, filesystems, and other similar programs to require 
a formatting step before they are used. For example, postgres requires you to 
run initdb. Linux requires you to run mkfs before using a filesystem. Windows 
requires you to run "format c:/", or something equivalent. Ceph requires you to 
run the ceph-deploy tool or a similar tool. It's really not a high operational 
burden because it only has to be done once when the system is initialized.

With a clearly defined initialization step, you can clearly distinguish disk 
problems from simply the first startup of a cluster. This is actually quite 
important to the correctness of the system. For example, if I start up two out 
of three Raft nodes and their disks erroneously show up as blank, I could elect 
a leader with an empty log. In that case, I've silently lost all the metadata 
in the system.

In general, there is a bootstrapping problem where brokers may not be able to 
connect to the controller quorum without first having some local metadata. For 
example, if you are managing users using SCRAM, the SCRAM principal for the 
broker needs to exist before the connection can be made. We call this 
"bootstrapping" because it requires you to "lift yourself up by your own 
bootstraps." You need the metadata to fetch the metadata. The explicit 
initialization step breaks the cycle and allows the cluster to be successfully 
created.

I agree that in testing, it is nice not to have to run a separate command. To 
facilitate this, we could have a bash script that allows developers to start up 
a single node cluster without running kafka-storage.sh. That might be helpful. 
I suppose a docker image is another way to do it, which might also help people 
test.

best,
Colin


On Mon, Nov 29, 2021, at 12:20, Igor Soarez wrote:
> Hi all,
>
> Bumping this thread as it’s been a while.
>
> Looking forward to any kind of feedback, pease take a look.
>
> I created a short PR with a possible implementation - 
> https://github.com/apache/kafka/pull/11549
>
> --
> Igor
>
>
>
>> On 18 Oct 2021, at 15:11, Igor Soarez <soa...@apple.com.INVALID> wrote:
>> 
>> Hi all,
>> 
>> I'd like to propose that we simplify the operation of KRaft servers a bit by 
>> removing the requirement to run kafka-storage.sh for new storage directories.
>> 
>> Please take a look at the KIP and provide your feedback:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-785%3A+Automatic+storage+formatting
>> 
>> --
>> Igor
>>

Reply via email to