Neha, Thanks for your response.
Yes, we really need to specify number of partitions differently for thousand topics. The option num.partitions does a little help in our use case. Some reasons Taylor and I already explained in the previous emails. In addition, we currently implement semantic partitioning to maintain various kinds of produced data. High throughput is needed for important and quickly-produced/processed kinds of data but not for others. What do you think? Thanks, ~Thai On Sat, Mar 3, 2012 at 6:35 AM, Neha Narkhede <[email protected]>wrote: > Thai, > > Do you really need to specify the number of partitions differently for > so many topics ? > I wonder if setting the right default for num.partitions works instead ? > > Thanks, > Neha > > On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <[email protected]> > wrote: > > Hi, > > > > I like the idea Taylor suggested. This will definitely help a lot. > > > > Another approach I would suggest is to let Kafka load information of > > topic.partition.count.map from an external file (plain-text, xml, ect) in > > some format like: > > topic1:#partition > > topic2:#partition > > .... > > topicn:#partition > > > > By this way, a Kafka user will be also able to modify (manually or > > automatically by a script) this information as he/she wants. > > > > What do you think? > > > > Thanks, > > ~Thai > > > > On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <[email protected]> > wrote: > > > >> Jun, > >> > >> No, it's necessary for us to modify the tuning parameters on a per topic > >> basis using wildcards, e.g. > >> > >> topic.flush.intervals.ms=chat*:100,presence*:1000 > >> > >> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <[email protected]> wrote: > >> > >> > Taylor, > >> > > >> > We don't have a jira for that. Please open one. > >> > > >> > In 0.8, we will have DDLs for creating topics, which you can use to > >> > customize # partitions. Will that be enough? > >> > > >> > Jun > >> > > >> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <[email protected]> > >> > wrote: > >> > > >> > > Hi Thai. > >> > > > >> > > Well, actually we didn't solve this problem. We had to use the > global > >> > > topic settings that apply to all topics. > >> > > > >> > > I would really like to see globs (wildcards) supported in the config > >> > > settings. This is something my team and I have discussed on several > >> > > occasions. > >> > > > >> > > I'm not sure if there is a Kafka JIRA to cover that feature⦠> >> > > > >> > > -Taylor > >> > > > >> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <[email protected] > > > >> > > wrote: > >> > > > >> > > > Hi Taylor, > >> > > > > >> > > > I found your email and the Kafka use case by chance. Our use case > is > >> a > >> > > > little similar to yours. We actually implement semantic > partitioning > >> to > >> > > > maintain some kind of produced data and we are also running > several > >> > > > thousand topics as you. > >> > > > > >> > > > One issue we have been facing is that it is totally inconvenient > for > >> us > >> > > to > >> > > > maintain and update Kafka server configuration (server.properties) > >> when > >> > > > running several thousand topics. We have to put number of > partitions > >> > on a > >> > > > per-topic in the way Kafka requires: > >> > > > > >> > > > ### Overrides for for the default given by num.partitions on a > >> > per-topic > >> > > > basis > >> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4 > >> > > > > >> > > > I am almost sure that you did meet this issue I have mentioned, > so I > >> am > >> > > > curious to know how you solved it. > >> > > > > >> > > > Thanks, > >> > > > ~Thai > >> > > > > >> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier < > [email protected] > >> > > >wrote: > >> > > > > >> > > >> We had to isolate topics to specific servers because we are > running > >> > > >> several hundred thousand topics in aggregate. > >> > > >> > >> > > >> Due to the directory strategy of Kafka it's not feasible to put > that > >> > > >> many topics in every host since they reside in a single > directory. > >> > > >> > >> > > >> An improvement we considered making was to make the data > directory > >> > > >> nested which would have alleviated this problem. We also could > have > >> > > >> tried a different filesystem but we weren't confident that would > >> solve > >> > > >> the problem entirely. > >> > > >> > >> > > >> The advantage to our solution is that each host in our Kafka > tier is > >> > > >> literally share nothing. It will scale horizontally for a long, > long > >> > > >> way. > >> > > >> > >> > > >> And it's also a contingency plan. Since Kafka was unproven (for > us > >> > > >> anyway at the time) it was easier to build smaller components > with > >> > > >> less overall functionality and glue them together in a scalable > way. > >> > > >> If we had had to we could have out a different message bus in > place. > >> > > >> But we didn't want to do that if we could avoid it :) > >> > > >> > >> > > >> > >> > > >> > >> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede < > [email protected]> > >> > > >> wrote: > >> > > >> > >> > > >> > Taylor, > >> > > >> > > >> > > >> > This sounds great ! Congratulations on this launch. > >> > > >> > > >> > > >> >>> But basically we have many topics, few messages (relatively) > per > >> > > topic > >> > > >> > > >> > > >> > Can you explain your strategy of mapping topics to brokers ? > The > >> > > >> default in > >> > > >> > Kafka today is to have all brokers host all topics. > >> > > >> > > >> > > >> >>> An end user browser makes a long-poll event http connection > to > >> > > receive > >> > > >> > 1:1 messages and 1:M messages from a specialized http server > we > >> > built > >> > > >> for > >> > > >> > this purpose. 1:M messages are delivered from Kafka. > >> > > >> > > >> > > >> > What do you use for receiving 1:1 messages ? > >> > > >> > > >> > > >> > Your use case is interesting and different. It will be great if > >> you > >> > > add > >> > > >> > relevant details here - > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By > >> > > >> > > >> > > >> > Thanks, > >> > > >> > Neha > >> > > >> > > >> > > >> > > >> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <[email protected]> > wrote: > >> > > >> > > >> > > >> >> Hi, Taylor, > >> > > >> >> > >> > > >> >> Thanks for the update. This is great. Could you update your > usage > >> > in > >> > > >> Kafka > >> > > >> >> wiki? Also, do you delete topics online? If so, how do you do > >> that? > >> > > >> >> > >> > > >> >> Jun > >> > > >> >> > >> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier < > >> > [email protected]> > >> > > >> >> wrote: > >> > > >> >> > >> > > >> >>> I've already mentioned this before, but I wanted to give a > quick > >> > > >> shout to > >> > > >> >>> let you guys know that our newest game, Deckadence, is 100% > live > >> > as > >> > > of > >> > > >> >>> yesterday. > >> > > >> >>> > >> > > >> >>> Check it out at http://www.tagged.com/deckadence.html > >> > > >> >>> > >> > > >> >>> A little about our use case: > >> > > >> >>> > >> > > >> >>> - Deckadence is a game of buying and selling - or rather > >> trading > >> > - > >> > > >> >>> cards. Every user on Tagged owns a card. There are 100M > uses > >> on > >> > > >> >> Tagged, > >> > > >> >>> so that means there are 100M cards to trade. > >> > > >> >>> - Kafka enables real-time delivery of events in the game > >> > > >> >>> - An end user browser makes a long-poll event http > connection > >> to > >> > > >> >> receive > >> > > >> >>> 1:1 messages and 1:M messages from a specialized http > server we > >> > > built > >> > > >> >> for > >> > > >> >>> this purpose. 1:M messages are delivered from Kafka. > >> > > >> >>> - Because of this design, we can publish a message anywhere > >> > inside > >> > > >> our > >> > > >> >>> datacenter and send it directly and immediately to any other > >> > system > >> > > >> >> that > >> > > >> >>> is > >> > > >> >>> subscribed to Kafka, or to an end-user browser > >> > > >> >>> - Every update event for every card is sent to a unique > topic > >> > that > >> > > >> >>> represents the users card. > >> > > >> >>> - When a user is browsing any card or list of cards - say a > >> > search > >> > > >> >>> result - their browser subscribes to all of the cards on > >> screen. > >> > > >> >>> - The effect of this is that any changes to any card seen > >> > on-screen > >> > > >> are > >> > > >> >>> seen in real-time by all users of the game > >> > > >> >>> - Our primary producers and consumers are PHP and NodeJS, > >> > > >> respectively > >> > > >> >>> > >> > > >> >>> Well, I plan to write up more about this use case in the near > >> > > future. > >> > > >> As > >> > > >> >>> you might have guessed, this is just about as far away from > the > >> > > >> original > >> > > >> >>> intent of Kafka as you could get - we have PHP that sends > >> messages > >> > > to > >> > > >> >>> Kafka. Since it's not good to hold a TCP connection open in > >> PHP, > >> > we > >> > > >> had > >> > > >> >> to > >> > > >> >>> do some trickery here. There was no existing Node client so > we > >> > had > >> > > to > >> > > >> >>> write our own. And since there are 100 million users > registered > >> > on > >> > > >> >> Tagged, > >> > > >> >>> that means we could have in theory 100M topics. Of course in > >> > > >> practice we > >> > > >> >>> have far fewer than that. One of the main things we > currently > >> > have > >> > > >> to do > >> > > >> >>> is aggressively clean topics. But basically we have many > >> topics, > >> > > few > >> > > >> >>> messages (relatively) per topic. And order matters, so we > had > >> to > >> > > deal > >> > > >> >> with > >> > > >> >>> ensuring that we could handle the number of topics we would > >> > create, > >> > > >> and > >> > > >> >>> ensure ordered delivery and receipt. > >> > > >> >>> > >> > > >> >>> In the future I have big plans for Kafka, another feature is > >> > > >> currently in > >> > > >> >>> private test and will be released to the public soon (it uses > >> > Kafka > >> > > >> in a > >> > > >> >>> more traditional way). And we hope to have many more in > 2012... > >> > > >> >>> > >> > > >> >> > >> > > >> > >> > > > > >> > > > > >> > > > >> > > >> >
