Re: implications of using large number of topics....

Jun Rao Thu, 11 Oct 2012 22:49:23 -0700

Mathias,

What matters is the total # partitions since each corresponds to a separate
directory on disk. It doesn't matter how may topics those partitions are
from.


Thanks,

Jun

On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg <
mathias.soederb...@gmail.com> wrote:

> Hey all,
>
> This is a quite interesting topic (no pun intended), and I've seen it come
> up at least once before.
>
> Me and a friend started experimenting with Kafka and ZooKeeper a little
> while ago (building a publisher / subscriber system with consistent hashing
> and whatnot) and currently we're using around 300 topics, all with one
> partition each. So far we haven't really done any serious performance
> testing, but I'm planning to do so in the following weeks. But I've got a
> few questions regardless:
>
>
> Does / should it make any difference in performance when one has a lot of
> topics compared to having one topic with a lot of partitions? I'm imagining
> that the system still needs to keep the same number of file descriptors
> open, but I'm not sure how this would affect reads and writes? Are we going
> to run into more random reads and writes by using a lot of topics compared
> to using one topic with a lot of partitions instead? Can't really wrap my
> head around this right now, mostly because of my rather limited knowledge
> about how disks and page caches work.
>
> Could also add that we're mostly doing sequential reads (in rare cases we
> have to rewind a topic) and that the number of topics doesn't change.
>
> On 11 October 2012 05:13, Taylor Gautier <tgaut...@gmail.com> wrote:
>
> > We used pattern #1 at Tagged.  I wouldn't recommend it unless you're
> really
> > committed.  It took a lot of work to get it working right.
> >
> > a) Performance degraded non-linearly (read it fell off a cliff) when
> > brokers were managing more than about 20k topics.  This was on a Linux
> RHEL
> > 5.3 system with EXT3.  YMMV.
> >
> > b) Startup time is significantly longer for a broker that is restarted
> due
> > to communication with ZK to sync up on those topics.
> >
> > c) If topics are short lived, even if Kafka expires the data segments
> using
> > it's standard 0.7 cleaner, the directory name for the topic will still
> > exist on disk and the topic is still considered "active" (in memory) in
> > Kafka.  This causes problems - see a above (open file handles).
> >
> > d) Message latency is affected.  Kafka syncs messages to disk if x
> messages
> > have buffered in memory, or y seconds have elapsed (both configurable).
>  If
> > you have few topics and many messages (pattern #2), you will be hitting
> the
> > x limit quite often, and get good throughput.  However, if you have many
> > topics and few messages per topic (pattern #1), you will have to rely on
> > the y threshold to flush to disk, and setting this too low can impact
> > performance (throughput) in a significant way.  Jay already mentioned
> this
> > as random writes.
> >
> > We had to implement a number of solutions ourselves to resolve these
> > issues, namely:
> >
> > #1 No ZK.  This means that all of the automatic partitioning done by
> Kafka
> > is not available, so we had to roll our own (luckily Tagged is pretty
> used
> > to scaling systems so there was much in-house expertise).  The solution
> > here was to implement a R/W proxy layer of machines to intercept messages
> > and read/write to/from Kafka handling the sharding at the proxy layer.
> >  Because most of our messages were coming from PHP and we didn't want to
> > use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a huge
> deal
> > (also, we wanted strict ordering of messages, so we needed a shard by
> topic
> > feature anyway (I believe this can be done in 0.7 but we started with
> 0.6)
> >
> > #2 Custom cleaner.  We implemented an extra cleanup task inside the Kafka
> > process that could completely remove a topic from memory and disk.  For
> > clients, this sometimes meant that a subscribed topic suddenly changed
> it's
> > physical offset from some offset X to 0, but that's ok, while technically
> > it probably would never happen theoretically clients should have to
> handle
> > this case anyway because the Kafka physical message space is limited to
> > 64-bits (again, unlikely to ever wrap in practice, but you never know).
> >  Anyway it's pretty easy to handle this just catch the "invalid offset"
> > error Kafka gives and start at 0.
> >
> > #3 Low threshold for flush.  This gave us good latency, but poor
> throughput
> > (relatively speaking).  We had more than enough throughput, but it was
> > nowhere near what Kafka can do when setup in pattern #1.
> >
> > Given that you want to manage "hundreds of thousands of topics" that may
> > mean that you would need 10's of Kafka brokers which could be another
> > source of problems - it's more cost, more management, and more sources of
> > failure.  SSD's may help solve this problem btw, but now you are talking
> > expensive machines rather than using just off the shelf cheapo servers
> with
> > standard SATA drives.
> >
> > On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Yes the footprint of a topic is one directory per partition (a topic
> can
> > > have many subpartitions per partitions). Each directory contains one or
> > > more files (depending on how much data you are retaining and the
> segment
> > > size, both configurable).
> > >
> > > In addition to having lots of open files, which certainly scales up to
> > the
> > > hundreds of thousands, this will also impact the I/O pattern. As the
> > number
> > > of files increases the data written to each file necessarily decreases.
> > > This likely means lots of random I/O. The OS can group together writes,
> > but
> > > if you only doing a single write per topic every now and then there
> will
> > be
> > > nothing to group and you will lots of small random I/O. This will
> > > definitely impact throughput. I don't know where the practical limits
> are
> > > we have tested up to ~500 topics and see reasonable performance. We
> have
> > > not done serious performance testing with tens of thousands of topics
> or
> > > more.
> > >
> > > In addition to the filesystem concerns there is metadata kept for each
> > > partition in zk, and I believe zk keeps this metadata in memory.
> > >
> > > -Jay
> > >
> > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <j...@squareup.com>
> > wrote:
> > >
> > > > Ok,
> > > >
> > > > Perhaps for the sake of argument, consider the question if we have
> > just 1
> > > > kafka broker.  It sounds like it will need to keep a file handle open
> > for
> > > > each topic?  Is that right?
> > > >
> > > > Jason
> > > >
> > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <
> > neha.narkh...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > We use option #2 at LinkedIn for metrics and tracking data.
> > Supporting
> > > > > Option #1 in Kafka 0.7 has its challenges since every topic is
> stored
> > > > > on every broker, by design. Hence, the number of topics a cluster
> can
> > > > > support is limited by the IO and number of open file handles on
> each
> > > > > broker. After Kafka 0.8 is released, the distribution of topics to
> > > > > brokers is user defined and can scale out with the number of
> brokers.
> > > > > Having said that, some Kafka users have successfully deployed Kafka
> > > > > 0.7 clusters hosting very high number of topics. I hope they can
> > share
> > > > > their experiences here.
> > > > >
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <j...@squareup.com
> >
> > > > wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I'm exploring using kafka for the first time.
> > > > > >
> > > > > > I'm contemplating a system where we transmit metric data at
> regular
> > > > > > intervals to kafka.  One question I have is whether to generate
> > > simple
> > > > > > messages with very little meta data (just timestamp and value),
> and
> > > > > keeping
> > > > > > meta data like the name/host/app that generated metric out of the
> > > > > message,
> > > > > > and have that be embodied in the name of the topic itself
> instead.
> > > > > >  Alternatively, we could have a relatively small number of
> topics,
> > > > which
> > > > > > contain messages which include source meta data along with the
> > > > timestamp
> > > > > > and metric value in each message.
> > > > > >
> > > > > > 1. On one hand, we'd have a large number of topics (say several
> > > hundred
> > > > > > thousand topics) with small messages, generated at a steady rate
> > (say
> > > > one
> > > > > > every 10 seconds).
> > > > > >
> > > > > > 2. Alternatively, we could have just few topics, which receive
> > > several
> > > > > > hundred thousand messages every 10 seconds, which contain 2 or 3
> > > times
> > > > > more
> > > > > > data per message.
> > > > > >
> > > > > > I'm wondering if kafka has any performance characteristics that
> > > differ
> > > > > for
> > > > > > the 2 scenarios.
> > > > > >
> > > > > > I like #1 because it simplifies targeted message consumption, and
> > > > enables
> > > > > > more interesting use of TopicFilter'ing.  But I'm unsure whether
> > > there
> > > > > > might be performance concerns with kafka (does it have to do more
> > > work
> > > > to
> > > > > > separately manage each topic?).  Is this a common use case, or
> not?
> > > > > >
> > > > > > Thanks for any insight.
> > > > > >
> > > > > > Jason
> > > > >
> > > >
> > >
> >
>

Re: implications of using large number of topics....

Reply via email to