We used pattern #1 at Tagged. I wouldn't recommend it unless you're really committed. It took a lot of work to get it working right.
a) Performance degraded non-linearly (read it fell off a cliff) when brokers were managing more than about 20k topics. This was on a Linux RHEL 5.3 system with EXT3. YMMV. b) Startup time is significantly longer for a broker that is restarted due to communication with ZK to sync up on those topics. c) If topics are short lived, even if Kafka expires the data segments using it's standard 0.7 cleaner, the directory name for the topic will still exist on disk and the topic is still considered "active" (in memory) in Kafka. This causes problems - see a above (open file handles). d) Message latency is affected. Kafka syncs messages to disk if x messages have buffered in memory, or y seconds have elapsed (both configurable). If you have few topics and many messages (pattern #2), you will be hitting the x limit quite often, and get good throughput. However, if you have many topics and few messages per topic (pattern #1), you will have to rely on the y threshold to flush to disk, and setting this too low can impact performance (throughput) in a significant way. Jay already mentioned this as random writes. We had to implement a number of solutions ourselves to resolve these issues, namely: #1 No ZK. This means that all of the automatic partitioning done by Kafka is not available, so we had to roll our own (luckily Tagged is pretty used to scaling systems so there was much in-house expertise). The solution here was to implement a R/W proxy layer of machines to intercept messages and read/write to/from Kafka handling the sharding at the proxy layer. Because most of our messages were coming from PHP and we didn't want to use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a huge deal (also, we wanted strict ordering of messages, so we needed a shard by topic feature anyway (I believe this can be done in 0.7 but we started with 0.6) #2 Custom cleaner. We implemented an extra cleanup task inside the Kafka process that could completely remove a topic from memory and disk. For clients, this sometimes meant that a subscribed topic suddenly changed it's physical offset from some offset X to 0, but that's ok, while technically it probably would never happen theoretically clients should have to handle this case anyway because the Kafka physical message space is limited to 64-bits (again, unlikely to ever wrap in practice, but you never know). Anyway it's pretty easy to handle this just catch the "invalid offset" error Kafka gives and start at 0. #3 Low threshold for flush. This gave us good latency, but poor throughput (relatively speaking). We had more than enough throughput, but it was nowhere near what Kafka can do when setup in pattern #1. Given that you want to manage "hundreds of thousands of topics" that may mean that you would need 10's of Kafka brokers which could be another source of problems - it's more cost, more management, and more sources of failure. SSD's may help solve this problem btw, but now you are talking expensive machines rather than using just off the shelf cheapo servers with standard SATA drives. On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Yes the footprint of a topic is one directory per partition (a topic can > have many subpartitions per partitions). Each directory contains one or > more files (depending on how much data you are retaining and the segment > size, both configurable). > > In addition to having lots of open files, which certainly scales up to the > hundreds of thousands, this will also impact the I/O pattern. As the number > of files increases the data written to each file necessarily decreases. > This likely means lots of random I/O. The OS can group together writes, but > if you only doing a single write per topic every now and then there will be > nothing to group and you will lots of small random I/O. This will > definitely impact throughput. I don't know where the practical limits are > we have tested up to ~500 topics and see reasonable performance. We have > not done serious performance testing with tens of thousands of topics or > more. > > In addition to the filesystem concerns there is metadata kept for each > partition in zk, and I believe zk keeps this metadata in memory. > > -Jay > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <j...@squareup.com> wrote: > > > Ok, > > > > Perhaps for the sake of argument, consider the question if we have just 1 > > kafka broker. It sounds like it will need to keep a file handle open for > > each topic? Is that right? > > > > Jason > > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <neha.narkh...@gmail.com > > >wrote: > > > > > Hi Jason, > > > > > > We use option #2 at LinkedIn for metrics and tracking data. Supporting > > > Option #1 in Kafka 0.7 has its challenges since every topic is stored > > > on every broker, by design. Hence, the number of topics a cluster can > > > support is limited by the IO and number of open file handles on each > > > broker. After Kafka 0.8 is released, the distribution of topics to > > > brokers is user defined and can scale out with the number of brokers. > > > Having said that, some Kafka users have successfully deployed Kafka > > > 0.7 clusters hosting very high number of topics. I hope they can share > > > their experiences here. > > > > > > Thanks, > > > Neha > > > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > Hi, > > > > > > > > I'm exploring using kafka for the first time. > > > > > > > > I'm contemplating a system where we transmit metric data at regular > > > > intervals to kafka. One question I have is whether to generate > simple > > > > messages with very little meta data (just timestamp and value), and > > > keeping > > > > meta data like the name/host/app that generated metric out of the > > > message, > > > > and have that be embodied in the name of the topic itself instead. > > > > Alternatively, we could have a relatively small number of topics, > > which > > > > contain messages which include source meta data along with the > > timestamp > > > > and metric value in each message. > > > > > > > > 1. On one hand, we'd have a large number of topics (say several > hundred > > > > thousand topics) with small messages, generated at a steady rate (say > > one > > > > every 10 seconds). > > > > > > > > 2. Alternatively, we could have just few topics, which receive > several > > > > hundred thousand messages every 10 seconds, which contain 2 or 3 > times > > > more > > > > data per message. > > > > > > > > I'm wondering if kafka has any performance characteristics that > differ > > > for > > > > the 2 scenarios. > > > > > > > > I like #1 because it simplifies targeted message consumption, and > > enables > > > > more interesting use of TopicFilter'ing. But I'm unsure whether > there > > > > might be performance concerns with kafka (does it have to do more > work > > to > > > > separately manage each topic?). Is this a common use case, or not? > > > > > > > > Thanks for any insight. > > > > > > > > Jason > > > > > >