Mathias, What matters is the total # partitions since each corresponds to a separate directory on disk. It doesn't matter how may topics those partitions are from.
Thanks, Jun On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg < mathias.soederb...@gmail.com> wrote: > Hey all, > > This is a quite interesting topic (no pun intended), and I've seen it come > up at least once before. > > Me and a friend started experimenting with Kafka and ZooKeeper a little > while ago (building a publisher / subscriber system with consistent hashing > and whatnot) and currently we're using around 300 topics, all with one > partition each. So far we haven't really done any serious performance > testing, but I'm planning to do so in the following weeks. But I've got a > few questions regardless: > > > Does / should it make any difference in performance when one has a lot of > topics compared to having one topic with a lot of partitions? I'm imagining > that the system still needs to keep the same number of file descriptors > open, but I'm not sure how this would affect reads and writes? Are we going > to run into more random reads and writes by using a lot of topics compared > to using one topic with a lot of partitions instead? Can't really wrap my > head around this right now, mostly because of my rather limited knowledge > about how disks and page caches work. > > Could also add that we're mostly doing sequential reads (in rare cases we > have to rewind a topic) and that the number of topics doesn't change. > > On 11 October 2012 05:13, Taylor Gautier <tgaut...@gmail.com> wrote: > > > We used pattern #1 at Tagged. I wouldn't recommend it unless you're > really > > committed. It took a lot of work to get it working right. > > > > a) Performance degraded non-linearly (read it fell off a cliff) when > > brokers were managing more than about 20k topics. This was on a Linux > RHEL > > 5.3 system with EXT3. YMMV. > > > > b) Startup time is significantly longer for a broker that is restarted > due > > to communication with ZK to sync up on those topics. > > > > c) If topics are short lived, even if Kafka expires the data segments > using > > it's standard 0.7 cleaner, the directory name for the topic will still > > exist on disk and the topic is still considered "active" (in memory) in > > Kafka. This causes problems - see a above (open file handles). > > > > d) Message latency is affected. Kafka syncs messages to disk if x > messages > > have buffered in memory, or y seconds have elapsed (both configurable). > If > > you have few topics and many messages (pattern #2), you will be hitting > the > > x limit quite often, and get good throughput. However, if you have many > > topics and few messages per topic (pattern #1), you will have to rely on > > the y threshold to flush to disk, and setting this too low can impact > > performance (throughput) in a significant way. Jay already mentioned > this > > as random writes. > > > > We had to implement a number of solutions ourselves to resolve these > > issues, namely: > > > > #1 No ZK. This means that all of the automatic partitioning done by > Kafka > > is not available, so we had to roll our own (luckily Tagged is pretty > used > > to scaling systems so there was much in-house expertise). The solution > > here was to implement a R/W proxy layer of machines to intercept messages > > and read/write to/from Kafka handling the sharding at the proxy layer. > > Because most of our messages were coming from PHP and we didn't want to > > use TCP we needed a UDP/TCP bridge/proxy anyway so this wasn't a huge > deal > > (also, we wanted strict ordering of messages, so we needed a shard by > topic > > feature anyway (I believe this can be done in 0.7 but we started with > 0.6) > > > > #2 Custom cleaner. We implemented an extra cleanup task inside the Kafka > > process that could completely remove a topic from memory and disk. For > > clients, this sometimes meant that a subscribed topic suddenly changed > it's > > physical offset from some offset X to 0, but that's ok, while technically > > it probably would never happen theoretically clients should have to > handle > > this case anyway because the Kafka physical message space is limited to > > 64-bits (again, unlikely to ever wrap in practice, but you never know). > > Anyway it's pretty easy to handle this just catch the "invalid offset" > > error Kafka gives and start at 0. > > > > #3 Low threshold for flush. This gave us good latency, but poor > throughput > > (relatively speaking). We had more than enough throughput, but it was > > nowhere near what Kafka can do when setup in pattern #1. > > > > Given that you want to manage "hundreds of thousands of topics" that may > > mean that you would need 10's of Kafka brokers which could be another > > source of problems - it's more cost, more management, and more sources of > > failure. SSD's may help solve this problem btw, but now you are talking > > expensive machines rather than using just off the shelf cheapo servers > with > > standard SATA drives. > > > > On Wed, Oct 10, 2012 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > > Yes the footprint of a topic is one directory per partition (a topic > can > > > have many subpartitions per partitions). Each directory contains one or > > > more files (depending on how much data you are retaining and the > segment > > > size, both configurable). > > > > > > In addition to having lots of open files, which certainly scales up to > > the > > > hundreds of thousands, this will also impact the I/O pattern. As the > > number > > > of files increases the data written to each file necessarily decreases. > > > This likely means lots of random I/O. The OS can group together writes, > > but > > > if you only doing a single write per topic every now and then there > will > > be > > > nothing to group and you will lots of small random I/O. This will > > > definitely impact throughput. I don't know where the practical limits > are > > > we have tested up to ~500 topics and see reasonable performance. We > have > > > not done serious performance testing with tens of thousands of topics > or > > > more. > > > > > > In addition to the filesystem concerns there is metadata kept for each > > > partition in zk, and I believe zk keeps this metadata in memory. > > > > > > -Jay > > > > > > On Wed, Oct 10, 2012 at 4:12 PM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > > > > Ok, > > > > > > > > Perhaps for the sake of argument, consider the question if we have > > just 1 > > > > kafka broker. It sounds like it will need to keep a file handle open > > for > > > > each topic? Is that right? > > > > > > > > Jason > > > > > > > > On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede < > > neha.narkh...@gmail.com > > > > >wrote: > > > > > > > > > Hi Jason, > > > > > > > > > > We use option #2 at LinkedIn for metrics and tracking data. > > Supporting > > > > > Option #1 in Kafka 0.7 has its challenges since every topic is > stored > > > > > on every broker, by design. Hence, the number of topics a cluster > can > > > > > support is limited by the IO and number of open file handles on > each > > > > > broker. After Kafka 0.8 is released, the distribution of topics to > > > > > brokers is user defined and can scale out with the number of > brokers. > > > > > Having said that, some Kafka users have successfully deployed Kafka > > > > > 0.7 clusters hosting very high number of topics. I hope they can > > share > > > > > their experiences here. > > > > > > > > > > Thanks, > > > > > Neha > > > > > > > > > > On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <j...@squareup.com > > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > I'm exploring using kafka for the first time. > > > > > > > > > > > > I'm contemplating a system where we transmit metric data at > regular > > > > > > intervals to kafka. One question I have is whether to generate > > > simple > > > > > > messages with very little meta data (just timestamp and value), > and > > > > > keeping > > > > > > meta data like the name/host/app that generated metric out of the > > > > > message, > > > > > > and have that be embodied in the name of the topic itself > instead. > > > > > > Alternatively, we could have a relatively small number of > topics, > > > > which > > > > > > contain messages which include source meta data along with the > > > > timestamp > > > > > > and metric value in each message. > > > > > > > > > > > > 1. On one hand, we'd have a large number of topics (say several > > > hundred > > > > > > thousand topics) with small messages, generated at a steady rate > > (say > > > > one > > > > > > every 10 seconds). > > > > > > > > > > > > 2. Alternatively, we could have just few topics, which receive > > > several > > > > > > hundred thousand messages every 10 seconds, which contain 2 or 3 > > > times > > > > > more > > > > > > data per message. > > > > > > > > > > > > I'm wondering if kafka has any performance characteristics that > > > differ > > > > > for > > > > > > the 2 scenarios. > > > > > > > > > > > > I like #1 because it simplifies targeted message consumption, and > > > > enables > > > > > > more interesting use of TopicFilter'ing. But I'm unsure whether > > > there > > > > > > might be performance concerns with kafka (does it have to do more > > > work > > > > to > > > > > > separately manage each topic?). Is this a common use case, or > not? > > > > > > > > > > > > Thanks for any insight. > > > > > > > > > > > > Jason > > > > > > > > > > > > > > >