Hi Taylor, I think you are correct the single-node scalability for the number of topics is not that great due to having multiple files per topic. I think the large directory problem can probably be mitigated by using a more modern filesystem, but as you and Jun point out ZK may also be strained.
One thing that may not be obvious is it is not required to keep all topics on all machines, this will help scale the non-zk aspects. To do this you can either pre-create the topics or else add a custom partitioner which maps particular topics only to a subset of machines. In this way if you had, say 15 machines you could spread each topic over 3 machines and get 5X the max number of topics. -Jay On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <[email protected]> wrote: > Hi, Tayler, > > That's a good question. As your pointed out, a large number of topics will > put stress on local file directory and ZK. Maybe you can do a bit testing > first to see what breaks with a large number of topics. After that, we can > look into what needs to be fixed. Making the directory structure > hierarchical is a possibility. > > Thanks, > > Jun > > > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <[email protected]> > wrote: > > > Hi. > > > > I am thinking to use kafka to send/receive messages for a large number of > > topics - order of 100k - 1M. > > > > It seems that the directory structure used for topics will probably not > > work > > for this usage. Also, I'm not sure if the in-memory data structures > might > > suffer - and also it may be problematic for zookeeper. > > > > One thought I have is to modify the directory structure to be a tree of > > directories. Not sure what if anything might need to be done to > in-memory > > structures or zookeeper info. > > > > Any thoughts? > > >
