Georgy created KAFKA-12900:
------------------------------
Summary: JBOD: Partitions count calculation does not take into
account topic name
Key: KAFKA-12900
URL: https://issues.apache.org/jira/browse/KAFKA-12900
Project: Kafka
Issue Type: Bug
Components: core, jbod
Affects Versions: 2.8.0
Reporter: Georgy
In [KAFKA-188|https://issues.apache.org/jira/browse/KAFKA-188] multiple data
directories support was implemented. New partitions are spread to multiple log
dirs based on partitions count calculation, log dir with least partitions count
is selected as next dir.
The problem exists because we do not take into account topic names when we do
such calculations. As a result some "fat" partitions can be located on fewer
disks than they should be.
Example:
Fat topic "F" with partitions: F1, F2, ... , F6
Thin topic "t" with partitions: t1, t2, ... , t6
Log dirs on broker: dir1, dir2, dir3
What we have now in some cases:
dir1: t1 t2 t4 t6
dir2: F1 F3 F4 F5
dir3: F2 t3 t5 F6
There is a skew but in terms of partition calculation it is "balanced" because
all of the log dirs have the same partition count.
It would be better if we count partitions in all log dirs for the current topic
which partition is going to be written. And then log dir with least partitions
count for that topic should be the next one. As a result partitions from
example above could be spread like this:
dir1: t1 F1 t6 F6
dir2: F2 t2 t4 F4
dir3: F3 t3 t5 F5
In my case there will be no skew because the producer's partitioner is "round
robin" by default and partition sizes are the same.
I've prepared a patch, please check it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)