Has there ever been a thought to better handle a large number of topics? Prior discussions? Or would it likely be too great of a change to the way kafka works, no matter what?
I'm wondering if there's a way to have a notion of multiple "virtual" topics which are internally managed as members of a single topic "group", but which at the api level, appear to be unique topics, from the client perspective. Naturally, it would be straightforward to implement something like this by wrapping the current client apis, but I'm wondering if there's any benefit to building it into the internals. This would still have the downside that a client subscribing to a virtual topic would have to, under the covers, sift through lots of messages it's not interested in. Any other interesting approaches? Jason On Thu, Oct 11, 2012 at 10:48 PM, Jun Rao <jun...@gmail.com> wrote: > Mathias, > > What matters is the total # partitions since each corresponds to a separate > directory on disk. It doesn't matter how may topics those partitions are > from. > > Thanks, > > Jun > > On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg < > mathias.soederb...@gmail.com> wrote: > > > Hey all, > > > > This is a quite interesting topic (no pun intended), and I've seen it > come > > up at least once before. > > > > Me and a friend started experimenting with Kafka and ZooKeeper a little > > while ago (building a publisher / subscriber system with consistent > hashing > > and whatnot) and currently we're using around 300 topics, all with one > > partition each. So far we haven't really done any serious performance > > testing, but I'm planning to do so in the following weeks. But I've got a > > few questions regardless: > > > > > > Does / should it make any difference in performance when one has a lot of > > topics compared to having one topic with a lot of partitions? I'm > imagining > > that the system still needs to keep the same number of file descriptors > > open, but I'm not sure how this would affect reads and writes? Are we > going > > to run into more random reads and writes by using a lot of topics > compared > > to using one topic with a lot of partitions instead? Can't really wrap my > > head around this right now, mostly because of my rather limited knowledge > > about how disks and page caches work. > > > > Could also add that we're mostly doing sequential reads (in rare cases we > > have to rewind a topic) and that the number of topics doesn't change. > > > > On 11 October 2012 05:13, Taylor Gautier <tgaut...@gmail.com> wrote: > > > > > We used pattern #1 at Tagged. I wouldn't recommend it unless you're > > really > > > committed. It took a lot of work to get it working right. > > > > > > a) Performance degraded non-linearly (read it fell off a cliff) when > > > brokers were managing more than about 20k topics. This was on a Linux > > RHEL > > > 5.3 system with EXT3. YMMV. > > > > > > b) Startup time is significantly longer for a broker that is restarted > > due > > > to communication with ZK to sync up on those topics. > > > > > > c) If topics are short lived, even if Kafka expires the data segments > > using > > > it's standard 0.7 cleaner, the directory name for the topic will still > > > exist on disk and the topic is still considered "active" (in memory) in > > > Kafka. This causes problems - see a above (open file handles). > > > > > > d) Message latency is affected. Kafka syncs messages to disk if x > > messages > > > have buffered in memory, or y seconds have elapsed (both configurable). > > If > > > you have few topics and many messages (pattern #2), you will be hitting > > the > > > x limit quite often, and get good throughput. However, if you have > many > > > topics and few messages per topic (pattern #1), you will have to rely > on > > > the y threshold to flush to disk, and setting this too low can impact > > > performance (throughput) in a significant way. Jay already mentioned > > this > > > as random writes. > > > > > > We had to implement a number of solutions ourselves to resolve these > > > issues, namely: > > > > > > #1 No ZK. This means that all of the automatic partitioning done by > > Kafka > > > is not available, so we had to roll our own (luckily Tagged is pretty > > used > > > to scaling systems so there was much in-house expertise). The solution > > > here was to implement a R/W proxy layer of machines to intercept > messages > > > and read/write to/from Kafka handling the sharding at the proxy layer. > > > Because most of our messages were coming from PHP and we didn't want > to > > > use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a huge > > deal > > > (also, we wanted strict ordering of messages, so we needed a shard by > > topic > > > feature anyway (I believe this can be done in 0.7 but we started with > > 0.6) > > > > > > #2 Custom cleaner. We implemented an extra cleanup task inside the > Kafka > > > process that could completely remove a topic from memory and disk. For > > > clients, this sometimes meant that a subscribed topic suddenly changed > > it's > > > physical offset from some offset X to 0, but that's ok, while > technically > > > it probably would never happen theoretically clients should have to > > handle > > > this case anyway because the Kafka physical message space is limited to > > > 64-bits (again, unlikely to ever wrap in practice, but you never know). > > > Anyway it's pretty easy to handle this just catch the "invalid offset" > > > error Kafka gives and start at 0. > > > > > > #3 Low threshold for flush. This gave us good latency, but poor > > throughput > > > (relatively speaking). We had more than enough throughput, but it was > > > nowhere near what Kafka can do when setup in pattern #1. > > > > > > Given that you want to manage "hundreds of thousands of topics" that > may > > > mean that you would need 10's of Kafka brokers which could be another > > > source of problems - it's more cost, more management, and more sources > of > > > failure. SSD's may help solve this problem btw, but now you are > talking > > > expensive machines rather than using just off the shelf cheapo servers > > with > > > standard SATA drives. > > > > > > On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > > > > Yes the footprint of a topic is one directory per partition (a topic > > can > > > > have many subpartitions per partitions). Each directory contains one > or > > > > more files (depending on how much data you are retaining and the > > segment > > > > size, both configurable). > > > > > > > > In addition to having lots of open files, which certainly scales up > to > > > the > > > > hundreds of thousands, this will also impact the I/O pattern. As the > > > number > > > > of files increases the data written to each file necessarily > decreases. > > > > This likely means lots of random I/O. The OS can group together > writes, > > > but > > > > if you only doing a single write per topic every now and then there > > will > > > be > > > > nothing to group and you will lots of small random I/O. This will > > > > definitely impact throughput. I don't know where the practical limits > > are > > > > we have tested up to ~500 topics and see reasonable performance. We > > have > > > > not done serious performance testing with tens of thousands of topics > > or > > > > more. > > > > > > > > In addition to the filesystem concerns there is metadata kept for > each > > > > partition in zk, and I believe zk keeps this metadata in memory. > > > > > > > > -Jay > > > > > > > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <j...@squareup.com> > > > wrote: > > > > > > > > > Ok, > > > > > > > > > > Perhaps for the sake of argument, consider the question if we have > > > just 1 > > > > > kafka broker. It sounds like it will need to keep a file handle > open > > > for > > > > > each topic? Is that right? > > > > > > > > > > Jason > > > > > > > > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede < > > > neha.narkh...@gmail.com > > > > > >wrote: > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > We use option #2 at LinkedIn for metrics and tracking data. > > > Supporting > > > > > > Option #1 in Kafka 0.7 has its challenges since every topic is > > stored > > > > > > on every broker, by design. Hence, the number of topics a cluster > > can > > > > > > support is limited by the IO and number of open file handles on > > each > > > > > > broker. After Kafka 0.8 is released, the distribution of topics > to > > > > > > brokers is user defined and can scale out with the number of > > brokers. > > > > > > Having said that, some Kafka users have successfully deployed > Kafka > > > > > > 0.7 clusters hosting very high number of topics. I hope they can > > > share > > > > > > their experiences here. > > > > > > > > > > > > Thanks, > > > > > > Neha > > > > > > > > > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg < > j...@squareup.com > > > > > > > > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I'm exploring using kafka for the first time. > > > > > > > > > > > > > > I'm contemplating a system where we transmit metric data at > > regular > > > > > > > intervals to kafka. One question I have is whether to generate > > > > simple > > > > > > > messages with very little meta data (just timestamp and value), > > and > > > > > > keeping > > > > > > > meta data like the name/host/app that generated metric out of > the > > > > > > message, > > > > > > > and have that be embodied in the name of the topic itself > > instead. > > > > > > > Alternatively, we could have a relatively small number of > > topics, > > > > > which > > > > > > > contain messages which include source meta data along with the > > > > > timestamp > > > > > > > and metric value in each message. > > > > > > > > > > > > > > 1. On one hand, we'd have a large number of topics (say several > > > > hundred > > > > > > > thousand topics) with small messages, generated at a steady > rate > > > (say > > > > > one > > > > > > > every 10 seconds). > > > > > > > > > > > > > > 2. Alternatively, we could have just few topics, which receive > > > > several > > > > > > > hundred thousand messages every 10 seconds, which contain 2 or > 3 > > > > times > > > > > > more > > > > > > > data per message. > > > > > > > > > > > > > > I'm wondering if kafka has any performance characteristics that > > > > differ > > > > > > for > > > > > > > the 2 scenarios. > > > > > > > > > > > > > > I like #1 because it simplifies targeted message consumption, > and > > > > > enables > > > > > > > more interesting use of TopicFilter'ing. But I'm unsure > whether > > > > there > > > > > > > might be performance concerns with kafka (does it have to do > more > > > > work > > > > > to > > > > > > > separately manage each topic?). Is this a common use case, or > > not? > > > > > > > > > > > > > > Thanks for any insight. > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > >