Ok, Perhaps for the sake of argument, consider the question if we have just 1 kafka broker. It sounds like it will need to keep a file handle open for each topic? Is that right?
Jason On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > Hi Jason, > > We use option #2 at LinkedIn for metrics and tracking data. Supporting > Option #1 in Kafka 0.7 has its challenges since every topic is stored > on every broker, by design. Hence, the number of topics a cluster can > support is limited by the IO and number of open file handles on each > broker. After Kafka 0.8 is released, the distribution of topics to > brokers is user defined and can scale out with the number of brokers. > Having said that, some Kafka users have successfully deployed Kafka > 0.7 clusters hosting very high number of topics. I hope they can share > their experiences here. > > Thanks, > Neha > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <j...@squareup.com> wrote: > > Hi, > > > > I'm exploring using kafka for the first time. > > > > I'm contemplating a system where we transmit metric data at regular > > intervals to kafka. One question I have is whether to generate simple > > messages with very little meta data (just timestamp and value), and > keeping > > meta data like the name/host/app that generated metric out of the > message, > > and have that be embodied in the name of the topic itself instead. > > Alternatively, we could have a relatively small number of topics, which > > contain messages which include source meta data along with the timestamp > > and metric value in each message. > > > > 1. On one hand, we'd have a large number of topics (say several hundred > > thousand topics) with small messages, generated at a steady rate (say one > > every 10 seconds). > > > > 2. Alternatively, we could have just few topics, which receive several > > hundred thousand messages every 10 seconds, which contain 2 or 3 times > more > > data per message. > > > > I'm wondering if kafka has any performance characteristics that differ > for > > the 2 scenarios. > > > > I like #1 because it simplifies targeted message consumption, and enables > > more interesting use of TopicFilter'ing. But I'm unsure whether there > > might be performance concerns with kafka (does it have to do more work to > > separately manage each topic?). Is this a common use case, or not? > > > > Thanks for any insight. > > > > Jason >