Re: implications of using large number of topics....

Jun Rao Sat, 13 Oct 2012 20:37:40 -0700

Jason,

The issue with 0.7 is that a topic exists on every broker and every time
one adds a new broker, some additional partitions for each existing topic
are added to the new broker. This is going to change in 0.8. A topic has a
fixed number of partitions, independent of the # of brokers. So, by adding
more brokers, we can support more topics in a cluster.


Thanks,

Jun

On Fri, Oct 12, 2012 at 10:55 AM, Jason Rosenberg <j...@squareup.com> wrote:

> Has there ever been a thought to better handle a large number of topics?
>  Prior discussions?  Or would it likely be too great of a change to the way
> kafka works, no matter what?
>
> I'm wondering if there's a way to have a notion of multiple "virtual"
> topics which are internally managed as members of a single topic "group",
> but which at the api level, appear to be unique topics, from the client
> perspective.
>
> Naturally, it would be straightforward to implement something like this by
> wrapping the current client apis, but I'm wondering if there's any benefit
> to building it into the internals.  This would still have the downside that
> a client subscribing to a virtual topic would have to, under the covers,
> sift through lots of messages it's not interested in.
>
> Any other interesting approaches?
>
> Jason
>
>
> On Thu, Oct 11, 2012 at 10:48 PM, Jun Rao <jun...@gmail.com> wrote:
>
> > Mathias,
> >
> > What matters is the total # partitions since each corresponds to a
> separate
> > directory on disk. It doesn't matter how may topics those partitions are
> > from.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg <
> > mathias.soederb...@gmail.com> wrote:
> >
> > > Hey all,
> > >
> > > This is a quite interesting topic (no pun intended), and I've seen it
> > come
> > > up at least once before.
> > >
> > > Me and a friend started experimenting with Kafka and ZooKeeper a little
> > > while ago (building a publisher / subscriber system with consistent
> > hashing
> > > and whatnot) and currently we're using around 300 topics, all with one
> > > partition each. So far we haven't really done any serious performance
> > > testing, but I'm planning to do so in the following weeks. But I've
> got a
> > > few questions regardless:
> > >
> > >
> > > Does / should it make any difference in performance when one has a lot
> of
> > > topics compared to having one topic with a lot of partitions? I'm
> > imagining
> > > that the system still needs to keep the same number of file descriptors
> > > open, but I'm not sure how this would affect reads and writes? Are we
> > going
> > > to run into more random reads and writes by using a lot of topics
> > compared
> > > to using one topic with a lot of partitions instead? Can't really wrap
> my
> > > head around this right now, mostly because of my rather limited
> knowledge
> > > about how disks and page caches work.
> > >
> > > Could also add that we're mostly doing sequential reads (in rare cases
> we
> > > have to rewind a topic) and that the number of topics doesn't change.
> > >
> > > On 11 October 2012 05:13, Taylor Gautier <tgaut...@gmail.com> wrote:
> > >
> > > > We used pattern #1 at Tagged.  I wouldn't recommend it unless you're
> > > really
> > > > committed.  It took a lot of work to get it working right.
> > > >
> > > > a) Performance degraded non-linearly (read it fell off a cliff) when
> > > > brokers were managing more than about 20k topics.  This was on a
> Linux
> > > RHEL
> > > > 5.3 system with EXT3.  YMMV.
> > > >
> > > > b) Startup time is significantly longer for a broker that is
> restarted
> > > due
> > > > to communication with ZK to sync up on those topics.
> > > >
> > > > c) If topics are short lived, even if Kafka expires the data segments
> > > using
> > > > it's standard 0.7 cleaner, the directory name for the topic will
> still
> > > > exist on disk and the topic is still considered "active" (in memory)
> in
> > > > Kafka.  This causes problems - see a above (open file handles).
> > > >
> > > > d) Message latency is affected.  Kafka syncs messages to disk if x
> > > messages
> > > > have buffered in memory, or y seconds have elapsed (both
> configurable).
> > >  If
> > > > you have few topics and many messages (pattern #2), you will be
> hitting
> > > the
> > > > x limit quite often, and get good throughput.  However, if you have
> > many
> > > > topics and few messages per topic (pattern #1), you will have to rely
> > on
> > > > the y threshold to flush to disk, and setting this too low can impact
> > > > performance (throughput) in a significant way.  Jay already mentioned
> > > this
> > > > as random writes.
> > > >
> > > > We had to implement a number of solutions ourselves to resolve these
> > > > issues, namely:
> > > >
> > > > #1 No ZK.  This means that all of the automatic partitioning done by
> > > Kafka
> > > > is not available, so we had to roll our own (luckily Tagged is pretty
> > > used
> > > > to scaling systems so there was much in-house expertise).  The
> solution
> > > > here was to implement a R/W proxy layer of machines to intercept
> > messages
> > > > and read/write to/from Kafka handling the sharding at the proxy
> layer.
> > > >  Because most of our messages were coming from PHP and we didn't want
> > to
> > > > use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a huge
> > > deal
> > > > (also, we wanted strict ordering of messages, so we needed a shard by
> > > topic
> > > > feature anyway (I believe this can be done in 0.7 but we started with
> > > 0.6)
> > > >
> > > > #2 Custom cleaner.  We implemented an extra cleanup task inside the
> > Kafka
> > > > process that could completely remove a topic from memory and disk.
>  For
> > > > clients, this sometimes meant that a subscribed topic suddenly
> changed
> > > it's
> > > > physical offset from some offset X to 0, but that's ok, while
> > technically
> > > > it probably would never happen theoretically clients should have to
> > > handle
> > > > this case anyway because the Kafka physical message space is limited
> to
> > > > 64-bits (again, unlikely to ever wrap in practice, but you never
> know).
> > > >  Anyway it's pretty easy to handle this just catch the "invalid
> offset"
> > > > error Kafka gives and start at 0.
> > > >
> > > > #3 Low threshold for flush.  This gave us good latency, but poor
> > > throughput
> > > > (relatively speaking).  We had more than enough throughput, but it
> was
> > > > nowhere near what Kafka can do when setup in pattern #1.
> > > >
> > > > Given that you want to manage "hundreds of thousands of topics" that
> > may
> > > > mean that you would need 10's of Kafka brokers which could be another
> > > > source of problems - it's more cost, more management, and more
> sources
> > of
> > > > failure.  SSD's may help solve this problem btw, but now you are
> > talking
> > > > expensive machines rather than using just off the shelf cheapo
> servers
> > > with
> > > > standard SATA drives.
> > > >
> > > > On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >
> > > > > Yes the footprint of a topic is one directory per partition (a
> topic
> > > can
> > > > > have many subpartitions per partitions). Each directory contains
> one
> > or
> > > > > more files (depending on how much data you are retaining and the
> > > segment
> > > > > size, both configurable).
> > > > >
> > > > > In addition to having lots of open files, which certainly scales up
> > to
> > > > the
> > > > > hundreds of thousands, this will also impact the I/O pattern. As
> the
> > > > number
> > > > > of files increases the data written to each file necessarily
> > decreases.
> > > > > This likely means lots of random I/O. The OS can group together
> > writes,
> > > > but
> > > > > if you only doing a single write per topic every now and then there
> > > will
> > > > be
> > > > > nothing to group and you will lots of small random I/O. This will
> > > > > definitely impact throughput. I don't know where the practical
> limits
> > > are
> > > > > we have tested up to ~500 topics and see reasonable performance. We
> > > have
> > > > > not done serious performance testing with tens of thousands of
> topics
> > > or
> > > > > more.
> > > > >
> > > > > In addition to the filesystem concerns there is metadata kept for
> > each
> > > > > partition in zk, and I believe zk keeps this metadata in memory.
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <j...@squareup.com
> >
> > > > wrote:
> > > > >
> > > > > > Ok,
> > > > > >
> > > > > > Perhaps for the sake of argument, consider the question if we
> have
> > > > just 1
> > > > > > kafka broker.  It sounds like it will need to keep a file handle
> > open
> > > > for
> > > > > > each topic?  Is that right?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <
> > > > neha.narkh...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Jason,
> > > > > > >
> > > > > > > We use option #2 at LinkedIn for metrics and tracking data.
> > > > Supporting
> > > > > > > Option #1 in Kafka 0.7 has its challenges since every topic is
> > > stored
> > > > > > > on every broker, by design. Hence, the number of topics a
> cluster
> > > can
> > > > > > > support is limited by the IO and number of open file handles on
> > > each
> > > > > > > broker. After Kafka 0.8 is released, the distribution of topics
> > to
> > > > > > > brokers is user defined and can scale out with the number of
> > > brokers.
> > > > > > > Having said that, some Kafka users have successfully deployed
> > Kafka
> > > > > > > 0.7 clusters hosting very high number of topics. I hope they
> can
> > > > share
> > > > > > > their experiences here.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Neha
> > > > > > >
> > > > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <
> > j...@squareup.com
> > > >
> > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm exploring using kafka for the first time.
> > > > > > > >
> > > > > > > > I'm contemplating a system where we transmit metric data at
> > > regular
> > > > > > > > intervals to kafka.  One question I have is whether to
> generate
> > > > > simple
> > > > > > > > messages with very little meta data (just timestamp and
> value),
> > > and
> > > > > > > keeping
> > > > > > > > meta data like the name/host/app that generated metric out of
> > the
> > > > > > > message,
> > > > > > > > and have that be embodied in the name of the topic itself
> > > instead.
> > > > > > > >  Alternatively, we could have a relatively small number of
> > > topics,
> > > > > > which
> > > > > > > > contain messages which include source meta data along with
> the
> > > > > > timestamp
> > > > > > > > and metric value in each message.
> > > > > > > >
> > > > > > > > 1. On one hand, we'd have a large number of topics (say
> several
> > > > > hundred
> > > > > > > > thousand topics) with small messages, generated at a steady
> > rate
> > > > (say
> > > > > > one
> > > > > > > > every 10 seconds).
> > > > > > > >
> > > > > > > > 2. Alternatively, we could have just few topics, which
> receive
> > > > > several
> > > > > > > > hundred thousand messages every 10 seconds, which contain 2
> or
> > 3
> > > > > times
> > > > > > > more
> > > > > > > > data per message.
> > > > > > > >
> > > > > > > > I'm wondering if kafka has any performance characteristics
> that
> > > > > differ
> > > > > > > for
> > > > > > > > the 2 scenarios.
> > > > > > > >
> > > > > > > > I like #1 because it simplifies targeted message consumption,
> > and
> > > > > > enables
> > > > > > > > more interesting use of TopicFilter'ing.  But I'm unsure
> > whether
> > > > > there
> > > > > > > > might be performance concerns with kafka (does it have to do
> > more
> > > > > work
> > > > > > to
> > > > > > > > separately manage each topic?).  Is this a common use case,
> or
> > > not?
> > > > > > > >
> > > > > > > > Thanks for any insight.
> > > > > > > >
> > > > > > > > Jason
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: implications of using large number of topics....

Reply via email to