Re: implications of using large number of topics....

Jason Rosenberg Sat, 13 Oct 2012 23:43:35 -0700

Cool,

What's the schedule for 0.8 coming out?  Are there any pre-release versions?


Jason

On Sat, Oct 13, 2012 at 8:37 PM, Jun Rao <jun...@gmail.com> wrote:

> Jason,
>
> The issue with 0.7 is that a topic exists on every broker and every time
> one adds a new broker, some additional partitions for each existing topic
> are added to the new broker. This is going to change in 0.8. A topic has a
> fixed number of partitions, independent of the # of brokers. So, by adding
> more brokers, we can support more topics in a cluster.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 12, 2012 at 10:55 AM, Jason Rosenberg <j...@squareup.com>
> wrote:
>
> > Has there ever been a thought to better handle a large number of topics?
> >  Prior discussions?  Or would it likely be too great of a change to the
> way
> > kafka works, no matter what?
> >
> > I'm wondering if there's a way to have a notion of multiple "virtual"
> > topics which are internally managed as members of a single topic "group",
> > but which at the api level, appear to be unique topics, from the client
> > perspective.
> >
> > Naturally, it would be straightforward to implement something like this
> by
> > wrapping the current client apis, but I'm wondering if there's any
> benefit
> > to building it into the internals.  This would still have the downside
> that
> > a client subscribing to a virtual topic would have to, under the covers,
> > sift through lots of messages it's not interested in.
> >
> > Any other interesting approaches?
> >
> > Jason
> >
> >
> > On Thu, Oct 11, 2012 at 10:48 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Mathias,
> > >
> > > What matters is the total # partitions since each corresponds to a
> > separate
> > > directory on disk. It doesn't matter how may topics those partitions
> are
> > > from.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg <
> > > mathias.soederb...@gmail.com> wrote:
> > >
> > > > Hey all,
> > > >
> > > > This is a quite interesting topic (no pun intended), and I've seen it
> > > come
> > > > up at least once before.
> > > >
> > > > Me and a friend started experimenting with Kafka and ZooKeeper a
> little
> > > > while ago (building a publisher / subscriber system with consistent
> > > hashing
> > > > and whatnot) and currently we're using around 300 topics, all with
> one
> > > > partition each. So far we haven't really done any serious performance
> > > > testing, but I'm planning to do so in the following weeks. But I've
> > got a
> > > > few questions regardless:
> > > >
> > > >
> > > > Does / should it make any difference in performance when one has a
> lot
> > of
> > > > topics compared to having one topic with a lot of partitions? I'm
> > > imagining
> > > > that the system still needs to keep the same number of file
> descriptors
> > > > open, but I'm not sure how this would affect reads and writes? Are we
> > > going
> > > > to run into more random reads and writes by using a lot of topics
> > > compared
> > > > to using one topic with a lot of partitions instead? Can't really
> wrap
> > my
> > > > head around this right now, mostly because of my rather limited
> > knowledge
> > > > about how disks and page caches work.
> > > >
> > > > Could also add that we're mostly doing sequential reads (in rare
> cases
> > we
> > > > have to rewind a topic) and that the number of topics doesn't change.
> > > >
> > > > On 11 October 2012 05:13, Taylor Gautier <tgaut...@gmail.com> wrote:
> > > >
> > > > > We used pattern #1 at Tagged.  I wouldn't recommend it unless
> you're
> > > > really
> > > > > committed.  It took a lot of work to get it working right.
> > > > >
> > > > > a) Performance degraded non-linearly (read it fell off a cliff)
> when
> > > > > brokers were managing more than about 20k topics.  This was on a
> > Linux
> > > > RHEL
> > > > > 5.3 system with EXT3.  YMMV.
> > > > >
> > > > > b) Startup time is significantly longer for a broker that is
> > restarted
> > > > due
> > > > > to communication with ZK to sync up on those topics.
> > > > >
> > > > > c) If topics are short lived, even if Kafka expires the data
> segments
> > > > using
> > > > > it's standard 0.7 cleaner, the directory name for the topic will
> > still
> > > > > exist on disk and the topic is still considered "active" (in
> memory)
> > in
> > > > > Kafka.  This causes problems - see a above (open file handles).
> > > > >
> > > > > d) Message latency is affected.  Kafka syncs messages to disk if x
> > > > messages
> > > > > have buffered in memory, or y seconds have elapsed (both
> > configurable).
> > > >  If
> > > > > you have few topics and many messages (pattern #2), you will be
> > hitting
> > > > the
> > > > > x limit quite often, and get good throughput.  However, if you have
> > > many
> > > > > topics and few messages per topic (pattern #1), you will have to
> rely
> > > on
> > > > > the y threshold to flush to disk, and setting this too low can
> impact
> > > > > performance (throughput) in a significant way.  Jay already
> mentioned
> > > > this
> > > > > as random writes.
> > > > >
> > > > > We had to implement a number of solutions ourselves to resolve
> these
> > > > > issues, namely:
> > > > >
> > > > > #1 No ZK.  This means that all of the automatic partitioning done
> by
> > > > Kafka
> > > > > is not available, so we had to roll our own (luckily Tagged is
> pretty
> > > > used
> > > > > to scaling systems so there was much in-house expertise).  The
> > solution
> > > > > here was to implement a R/W proxy layer of machines to intercept
> > > messages
> > > > > and read/write to/from Kafka handling the sharding at the proxy
> > layer.
> > > > >  Because most of our messages were coming from PHP and we didn't
> want
> > > to
> > > > > use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a
> huge
> > > > deal
> > > > > (also, we wanted strict ordering of messages, so we needed a shard
> by
> > > > topic
> > > > > feature anyway (I believe this can be done in 0.7 but we started
> with
> > > > 0.6)
> > > > >
> > > > > #2 Custom cleaner.  We implemented an extra cleanup task inside the
> > > Kafka
> > > > > process that could completely remove a topic from memory and disk.
> >  For
> > > > > clients, this sometimes meant that a subscribed topic suddenly
> > changed
> > > > it's
> > > > > physical offset from some offset X to 0, but that's ok, while
> > > technically
> > > > > it probably would never happen theoretically clients should have to
> > > > handle
> > > > > this case anyway because the Kafka physical message space is
> limited
> > to
> > > > > 64-bits (again, unlikely to ever wrap in practice, but you never
> > know).
> > > > >  Anyway it's pretty easy to handle this just catch the "invalid
> > offset"
> > > > > error Kafka gives and start at 0.
> > > > >
> > > > > #3 Low threshold for flush.  This gave us good latency, but poor
> > > > throughput
> > > > > (relatively speaking).  We had more than enough throughput, but it
> > was
> > > > > nowhere near what Kafka can do when setup in pattern #1.
> > > > >
> > > > > Given that you want to manage "hundreds of thousands of topics"
> that
> > > may
> > > > > mean that you would need 10's of Kafka brokers which could be
> another
> > > > > source of problems - it's more cost, more management, and more
> > sources
> > > of
> > > > > failure.  SSD's may help solve this problem btw, but now you are
> > > talking
> > > > > expensive machines rather than using just off the shelf cheapo
> > servers
> > > > with
> > > > > standard SATA drives.
> > > > >
> > > > > On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Yes the footprint of a topic is one directory per partition (a
> > topic
> > > > can
> > > > > > have many subpartitions per partitions). Each directory contains
> > one
> > > or
> > > > > > more files (depending on how much data you are retaining and the
> > > > segment
> > > > > > size, both configurable).
> > > > > >
> > > > > > In addition to having lots of open files, which certainly scales
> up
> > > to
> > > > > the
> > > > > > hundreds of thousands, this will also impact the I/O pattern. As
> > the
> > > > > number
> > > > > > of files increases the data written to each file necessarily
> > > decreases.
> > > > > > This likely means lots of random I/O. The OS can group together
> > > writes,
> > > > > but
> > > > > > if you only doing a single write per topic every now and then
> there
> > > > will
> > > > > be
> > > > > > nothing to group and you will lots of small random I/O. This will
> > > > > > definitely impact throughput. I don't know where the practical
> > limits
> > > > are
> > > > > > we have tested up to ~500 topics and see reasonable performance.
> We
> > > > have
> > > > > > not done serious performance testing with tens of thousands of
> > topics
> > > > or
> > > > > > more.
> > > > > >
> > > > > > In addition to the filesystem concerns there is metadata kept for
> > > each
> > > > > > partition in zk, and I believe zk keeps this metadata in memory.
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Ok,
> > > > > > >
> > > > > > > Perhaps for the sake of argument, consider the question if we
> > have
> > > > > just 1
> > > > > > > kafka broker.  It sounds like it will need to keep a file
> handle
> > > open
> > > > > for
> > > > > > > each topic?  Is that right?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <
> > > > > neha.narkh...@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hi Jason,
> > > > > > > >
> > > > > > > > We use option #2 at LinkedIn for metrics and tracking data.
> > > > > Supporting
> > > > > > > > Option #1 in Kafka 0.7 has its challenges since every topic
> is
> > > > stored
> > > > > > > > on every broker, by design. Hence, the number of topics a
> > cluster
> > > > can
> > > > > > > > support is limited by the IO and number of open file handles
> on
> > > > each
> > > > > > > > broker. After Kafka 0.8 is released, the distribution of
> topics
> > > to
> > > > > > > > brokers is user defined and can scale out with the number of
> > > > brokers.
> > > > > > > > Having said that, some Kafka users have successfully deployed
> > > Kafka
> > > > > > > > 0.7 clusters hosting very high number of topics. I hope they
> > can
> > > > > share
> > > > > > > > their experiences here.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Neha
> > > > > > > >
> > > > > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <
> > > j...@squareup.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm exploring using kafka for the first time.
> > > > > > > > >
> > > > > > > > > I'm contemplating a system where we transmit metric data at
> > > > regular
> > > > > > > > > intervals to kafka.  One question I have is whether to
> > generate
> > > > > > simple
> > > > > > > > > messages with very little meta data (just timestamp and
> > value),
> > > > and
> > > > > > > > keeping
> > > > > > > > > meta data like the name/host/app that generated metric out
> of
> > > the
> > > > > > > > message,
> > > > > > > > > and have that be embodied in the name of the topic itself
> > > > instead.
> > > > > > > > >  Alternatively, we could have a relatively small number of
> > > > topics,
> > > > > > > which
> > > > > > > > > contain messages which include source meta data along with
> > the
> > > > > > > timestamp
> > > > > > > > > and metric value in each message.
> > > > > > > > >
> > > > > > > > > 1. On one hand, we'd have a large number of topics (say
> > several
> > > > > > hundred
> > > > > > > > > thousand topics) with small messages, generated at a steady
> > > rate
> > > > > (say
> > > > > > > one
> > > > > > > > > every 10 seconds).
> > > > > > > > >
> > > > > > > > > 2. Alternatively, we could have just few topics, which
> > receive
> > > > > > several
> > > > > > > > > hundred thousand messages every 10 seconds, which contain 2
> > or
> > > 3
> > > > > > times
> > > > > > > > more
> > > > > > > > > data per message.
> > > > > > > > >
> > > > > > > > > I'm wondering if kafka has any performance characteristics
> > that
> > > > > > differ
> > > > > > > > for
> > > > > > > > > the 2 scenarios.
> > > > > > > > >
> > > > > > > > > I like #1 because it simplifies targeted message
> consumption,
> > > and
> > > > > > > enables
> > > > > > > > > more interesting use of TopicFilter'ing.  But I'm unsure
> > > whether
> > > > > > there
> > > > > > > > > might be performance concerns with kafka (does it have to
> do
> > > more
> > > > > > work
> > > > > > > to
> > > > > > > > > separately manage each topic?).  Is this a common use case,
> > or
> > > > not?
> > > > > > > > >
> > > > > > > > > Thanks for any insight.
> > > > > > > > >
> > > > > > > > > Jason
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: implications of using large number of topics....

Reply via email to