Basically the total number of partitions across all brokers determines the maximum parallelism of the consumer group. So if you want to have 12 consumer processes then any number larger than 12 will work. That said, fewer files generally gives better I/O efficiency, and more than say 10k files per machine is probably unwise.
At LinkedIn we find that most topics are just medium size so we default to 1 partition, and bump it up later for the handful of very large topics that need it. -Jay On Tue, May 15, 2012 at 3:00 AM, 刘明敏 <diveintotomor...@gmail.com> wrote: > We are considerring putting kafka into production. > > One thing we are not sure about is how many partitions for a topic is > suitable. > > I notice that in the operations page( > https://cwiki.apache.org/confluence/display/KAFKA/Operations#kafka),linkedin > choose just one partition: > > kafka.num.partitions=1 > > > Though have been explained in one discussing,I still don't quite get why > you choose only 1 partition: > > Pierre-Yves Ritschard: >> one partition only ? so the key here is that you start as many brokers >> as there are consumers ? > > > >> > > Jay Kreps: >> Yeah technically that is not 100% correct. We have tuned about 10 topics by >> adding more partitions to add parallelism. > > > "tuned about 10 topics by adding more partitions to add parallelism",does > this mean you dispatch > the same group of logs into 10 different topics,thus you get more > partitions(one partition for a topic and totally 10 topic,thus 10 > partitions on one broker) on each broker,and thus > you add parallelism? > > if yes,why not just increase the # of partitions of one certain topic? > > and how many partitions would you advise to assign for a topic? > > -- > Best Regards > > ---------------------- > 刘明敏 | mmLiu