Thai, Do you really need to specify the number of partitions differently for so many topics ? I wonder if setting the right default for num.partitions works instead ?
Thanks, Neha On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <[email protected]> wrote: > Hi, > > I like the idea Taylor suggested. This will definitely help a lot. > > Another approach I would suggest is to let Kafka load information of > topic.partition.count.map from an external file (plain-text, xml, ect) in > some format like: > topic1:#partition > topic2:#partition > .... > topicn:#partition > > By this way, a Kafka user will be also able to modify (manually or > automatically by a script) this information as he/she wants. > > What do you think? > > Thanks, > ~Thai > > On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <[email protected]> wrote: > >> Jun, >> >> No, it's necessary for us to modify the tuning parameters on a per topic >> basis using wildcards, e.g. >> >> topic.flush.intervals.ms=chat*:100,presence*:1000 >> >> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <[email protected]> wrote: >> >> > Taylor, >> > >> > We don't have a jira for that. Please open one. >> > >> > In 0.8, we will have DDLs for creating topics, which you can use to >> > customize # partitions. Will that be enough? >> > >> > Jun >> > >> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <[email protected]> >> > wrote: >> > >> > > Hi Thai. >> > > >> > > Well, actually we didn't solve this problem. We had to use the global >> > > topic settings that apply to all topics. >> > > >> > > I would really like to see globs (wildcards) supported in the config >> > > settings. This is something my team and I have discussed on several >> > > occasions. >> > > >> > > I'm not sure if there is a Kafka JIRA to cover that feature… >> > > >> > > -Taylor >> > > >> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <[email protected]> >> > > wrote: >> > > >> > > > Hi Taylor, >> > > > >> > > > I found your email and the Kafka use case by chance. Our use case is >> a >> > > > little similar to yours. We actually implement semantic partitioning >> to >> > > > maintain some kind of produced data and we are also running several >> > > > thousand topics as you. >> > > > >> > > > One issue we have been facing is that it is totally inconvenient for >> us >> > > to >> > > > maintain and update Kafka server configuration (server.properties) >> when >> > > > running several thousand topics. We have to put number of partitions >> > on a >> > > > per-topic in the way Kafka requires: >> > > > >> > > > ### Overrides for for the default given by num.partitions on a >> > per-topic >> > > > basis >> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4 >> > > > >> > > > I am almost sure that you did meet this issue I have mentioned, so I >> am >> > > > curious to know how you solved it. >> > > > >> > > > Thanks, >> > > > ~Thai >> > > > >> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <[email protected] >> > > >wrote: >> > > > >> > > >> We had to isolate topics to specific servers because we are running >> > > >> several hundred thousand topics in aggregate. >> > > >> >> > > >> Due to the directory strategy of Kafka it's not feasible to put that >> > > >> many topics in every host since they reside in a single directory. >> > > >> >> > > >> An improvement we considered making was to make the data directory >> > > >> nested which would have alleviated this problem. We also could have >> > > >> tried a different filesystem but we weren't confident that would >> solve >> > > >> the problem entirely. >> > > >> >> > > >> The advantage to our solution is that each host in our Kafka tier is >> > > >> literally share nothing. It will scale horizontally for a long, long >> > > >> way. >> > > >> >> > > >> And it's also a contingency plan. Since Kafka was unproven (for us >> > > >> anyway at the time) it was easier to build smaller components with >> > > >> less overall functionality and glue them together in a scalable way. >> > > >> If we had had to we could have out a different message bus in place. >> > > >> But we didn't want to do that if we could avoid it :) >> > > >> >> > > >> >> > > >> >> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <[email protected]> >> > > >> wrote: >> > > >> >> > > >> > Taylor, >> > > >> > >> > > >> > This sounds great ! Congratulations on this launch. >> > > >> > >> > > >> >>> But basically we have many topics, few messages (relatively) per >> > > topic >> > > >> > >> > > >> > Can you explain your strategy of mapping topics to brokers ? The >> > > >> default in >> > > >> > Kafka today is to have all brokers host all topics. >> > > >> > >> > > >> >>> An end user browser makes a long-poll event http connection to >> > > receive >> > > >> > 1:1 messages and 1:M messages from a specialized http server we >> > built >> > > >> for >> > > >> > this purpose. 1:M messages are delivered from Kafka. >> > > >> > >> > > >> > What do you use for receiving 1:1 messages ? >> > > >> > >> > > >> > Your use case is interesting and different. It will be great if >> you >> > > add >> > > >> > relevant details here - >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By >> > > >> > >> > > >> > Thanks, >> > > >> > Neha >> > > >> > >> > > >> > >> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <[email protected]> wrote: >> > > >> > >> > > >> >> Hi, Taylor, >> > > >> >> >> > > >> >> Thanks for the update. This is great. Could you update your usage >> > in >> > > >> Kafka >> > > >> >> wiki? Also, do you delete topics online? If so, how do you do >> that? >> > > >> >> >> > > >> >> Jun >> > > >> >> >> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier < >> > [email protected]> >> > > >> >> wrote: >> > > >> >> >> > > >> >>> I've already mentioned this before, but I wanted to give a quick >> > > >> shout to >> > > >> >>> let you guys know that our newest game, Deckadence, is 100% live >> > as >> > > of >> > > >> >>> yesterday. >> > > >> >>> >> > > >> >>> Check it out at http://www.tagged.com/deckadence.html >> > > >> >>> >> > > >> >>> A little about our use case: >> > > >> >>> >> > > >> >>> - Deckadence is a game of buying and selling - or rather >> trading >> > - >> > > >> >>> cards. Every user on Tagged owns a card. There are 100M uses >> on >> > > >> >> Tagged, >> > > >> >>> so that means there are 100M cards to trade. >> > > >> >>> - Kafka enables real-time delivery of events in the game >> > > >> >>> - An end user browser makes a long-poll event http connection >> to >> > > >> >> receive >> > > >> >>> 1:1 messages and 1:M messages from a specialized http server we >> > > built >> > > >> >> for >> > > >> >>> this purpose. 1:M messages are delivered from Kafka. >> > > >> >>> - Because of this design, we can publish a message anywhere >> > inside >> > > >> our >> > > >> >>> datacenter and send it directly and immediately to any other >> > system >> > > >> >> that >> > > >> >>> is >> > > >> >>> subscribed to Kafka, or to an end-user browser >> > > >> >>> - Every update event for every card is sent to a unique topic >> > that >> > > >> >>> represents the users card. >> > > >> >>> - When a user is browsing any card or list of cards - say a >> > search >> > > >> >>> result - their browser subscribes to all of the cards on >> screen. >> > > >> >>> - The effect of this is that any changes to any card seen >> > on-screen >> > > >> are >> > > >> >>> seen in real-time by all users of the game >> > > >> >>> - Our primary producers and consumers are PHP and NodeJS, >> > > >> respectively >> > > >> >>> >> > > >> >>> Well, I plan to write up more about this use case in the near >> > > future. >> > > >> As >> > > >> >>> you might have guessed, this is just about as far away from the >> > > >> original >> > > >> >>> intent of Kafka as you could get - we have PHP that sends >> messages >> > > to >> > > >> >>> Kafka. Since it's not good to hold a TCP connection open in >> PHP, >> > we >> > > >> had >> > > >> >> to >> > > >> >>> do some trickery here. There was no existing Node client so we >> > had >> > > to >> > > >> >>> write our own. And since there are 100 million users registered >> > on >> > > >> >> Tagged, >> > > >> >>> that means we could have in theory 100M topics. Of course in >> > > >> practice we >> > > >> >>> have far fewer than that. One of the main things we currently >> > have >> > > >> to do >> > > >> >>> is aggressively clean topics. But basically we have many >> topics, >> > > few >> > > >> >>> messages (relatively) per topic. And order matters, so we had >> to >> > > deal >> > > >> >> with >> > > >> >>> ensuring that we could handle the number of topics we would >> > create, >> > > >> and >> > > >> >>> ensure ordered delivery and receipt. >> > > >> >>> >> > > >> >>> In the future I have big plans for Kafka, another feature is >> > > >> currently in >> > > >> >>> private test and will be released to the public soon (it uses >> > Kafka >> > > >> in a >> > > >> >>> more traditional way). And we hope to have many more in 2012... >> > > >> >>> >> > > >> >> >> > > >> >> > > > >> > > > >> > > >> > >>
