Hi Taylor, thanks for your reply. I'd love to read your blog post about your experiences with it, especially around hardware configuration and how you consume the data (few/many short/long-lived processes, average throughput per topic). The cleanup script seems really useful too, I was considering writing one that also cleans dead topics off zookeeper.
Thanks! Lorenzo On Tue, Jul 31, 2012 at 8:58 PM, Taylor Gautier <tgaut...@tagged.com> wrote: > Yes, we have done so at Tagged. I chronicled a bit of our experience here > on the the mailing list. Effectively we found that a single machine could > not go above ~20k total topics. This could be OS dependent however (we use > CentOS 5.x) > > Various tweaks we made to go further: > > 1. a beefed up node.js kafka client/producer implementation - > https://github.com/tagged/node-kafka lies at the heart of our kafka > deployment > 2. our own kafka software load balancer (implemented using said library) > that shards out independent Kafka instances (guarantees in-order > delivery > per topic and scales the # of kafka topics linearly as a function of > the # > of kafka machines) > 3. a continuous cleaner that removes old dead topics completely from the > filesystem (0.7 cleaner leaves empty directory/file which eats up open > file > handles and limits max # of topics) > 4. (coming soon) a hierarchical topic directory structure to ease the > pain of too main directories/files in a single directory (should help > the > ~20k number, though probably by less than you might imagine) > > On our todo list is blogging about this in more detail, and contributing > back more than just the node.js implementation. > > On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <l.alber...@gmail.com > >wrote: > > > Is there anyone who tried Kafka with thousands of concurrent topics? > > If so, what are your experiences? How did you tune it? > > > > Thanks! > > >