Hi Jason,

We use option #2 at LinkedIn for metrics and tracking data. Supporting
Option #1 in Kafka 0.7 has its challenges since every topic is stored
on every broker, by design. Hence, the number of topics a cluster can
support is limited by the IO and number of open file handles on each
broker. After Kafka 0.8 is released, the distribution of topics to
brokers is user defined and can scale out with the number of brokers.
Having said that, some Kafka users have successfully deployed Kafka
0.7 clusters hosting very high number of topics. I hope they can share
their experiences here.

Thanks,
Neha

On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <j...@squareup.com> wrote:
> Hi,
>
> I'm exploring using kafka for the first time.
>
> I'm contemplating a system where we transmit metric data at regular
> intervals to kafka.  One question I have is whether to generate simple
> messages with very little meta data (just timestamp and value), and keeping
> meta data like the name/host/app that generated metric out of the message,
> and have that be embodied in the name of the topic itself instead.
>  Alternatively, we could have a relatively small number of topics, which
> contain messages which include source meta data along with the timestamp
> and metric value in each message.
>
> 1. On one hand, we'd have a large number of topics (say several hundred
> thousand topics) with small messages, generated at a steady rate (say one
> every 10 seconds).
>
> 2. Alternatively, we could have just few topics, which receive several
> hundred thousand messages every 10 seconds, which contain 2 or 3 times more
> data per message.
>
> I'm wondering if kafka has any performance characteristics that differ for
> the 2 scenarios.
>
> I like #1 because it simplifies targeted message consumption, and enables
> more interesting use of TopicFilter'ing.  But I'm unsure whether there
> might be performance concerns with kafka (does it have to do more work to
> separately manage each topic?).  Is this a common use case, or not?
>
> Thanks for any insight.
>
> Jason

Reply via email to