Re: Kafka is live in prod @ 100%

Neha Narkhede Fri, 02 Mar 2012 15:35:35 -0800

Thai,

Do you really need to specify the number of partitions differently for
so many topics ?
I wonder if setting the right default for num.partitions works instead ?


Thanks,
Neha

On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <[email protected]> wrote:
> Hi,
>
> I like the idea Taylor suggested. This will definitely help a lot.
>
> Another approach I would suggest is to let Kafka load information of
> topic.partition.count.map from an external file (plain-text, xml, ect) in
> some format like:
> topic1:#partition
> topic2:#partition
> ....
> topicn:#partition
>
> By this way, a Kafka user will be also able to modify (manually or
> automatically by a script) this information as he/she wants.
>
> What do you think?
>
> Thanks,
> ~Thai
>
> On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <[email protected]> wrote:
>
>> Jun,
>>
>> No, it's necessary for us to modify the tuning parameters on a per topic
>> basis using wildcards, e.g.
>>
>> topic.flush.intervals.ms=chat*:100,presence*:1000
>>
>> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <[email protected]> wrote:
>>
>> > Taylor,
>> >
>> > We don't have a jira for that. Please open one.
>> >
>> > In 0.8, we will have DDLs for creating topics, which you can use to
>> > customize # partitions. Will that be enough?
>> >
>> > Jun
>> >
>> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <[email protected]>
>> > wrote:
>> >
>> > > Hi Thai.
>> > >
>> > > Well, actually we didn't solve this problem.  We had to use the global
>> > > topic settings that apply to all topics.
>> > >
>> > > I would really like to see globs (wildcards) supported in the config
>> > > settings.  This is something my team and I have discussed on several
>> > > occasions.
>> > >
>> > > I'm not sure if there is a Kafka JIRA to cover that feature…
>> > >
>> > > -Taylor
>> > >
>> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <[email protected]>
>> > > wrote:
>> > >
>> > > > Hi Taylor,
>> > > >
>> > > > I found your email and the Kafka use case by chance. Our use case is
>> a
>> > > > little similar to yours. We actually implement semantic partitioning
>> to
>> > > > maintain some kind of produced data and we are also running several
>> > > > thousand topics as you.
>> > > >
>> > > > One issue we have been facing is that it is totally inconvenient for
>> us
>> > > to
>> > > > maintain and update Kafka server configuration (server.properties)
>> when
>> > > > running several thousand topics. We have to put number of partitions
>> > on a
>> > > > per-topic in the way Kafka requires:
>> > > >
>> > > > ### Overrides for for the default given by num.partitions on a
>> > per-topic
>> > > > basis
>> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
>> > > >
>> > > > I am almost sure that you did meet this issue I have mentioned, so I
>> am
>> > > > curious to know how you solved it.
>> > > >
>> > > > Thanks,
>> > > > ~Thai
>> > > >
>> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <[email protected]
>> > > >wrote:
>> > > >
>> > > >> We had to isolate topics to specific servers because we are running
>> > > >> several hundred thousand topics in aggregate.
>> > > >>
>> > > >> Due to the directory strategy of Kafka it's not feasible to put that
>> > > >> many topics in every host since they reside in a single directory.
>> > > >>
>> > > >> An improvement we considered making was to make the data directory
>> > > >> nested which would have alleviated this problem.  We also could have
>> > > >> tried a different filesystem but we weren't confident that would
>> solve
>> > > >> the problem entirely.
>> > > >>
>> > > >> The advantage to our solution is that each host in our Kafka tier is
>> > > >> literally share nothing. It will scale horizontally for a long, long
>> > > >> way.
>> > > >>
>> > > >> And it's also a contingency plan. Since Kafka was unproven (for us
>> > > >> anyway at the time) it was easier to build smaller components with
>> > > >> less overall functionality and glue them together in a scalable way.
>> > > >> If we had had to we could have out a different message bus in place.
>> > > >> But we didn't want to do that if we could avoid it :)
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <[email protected]>
>> > > >> wrote:
>> > > >>
>> > > >> > Taylor,
>> > > >> >
>> > > >> > This sounds great ! Congratulations on this launch.
>> > > >> >
>> > > >> >>> But basically we have many topics, few messages (relatively) per
>> > > topic
>> > > >> >
>> > > >> > Can you explain your strategy of mapping topics to brokers ? The
>> > > >> default in
>> > > >> > Kafka today is to have all brokers host all topics.
>> > > >> >
>> > > >> >>> An end user browser makes a long-poll event http connection to
>> > > receive
>> > > >> >  1:1 messages and 1:M messages from a specialized http server we
>> > built
>> > > >> for
>> > > >> >  this purpose.  1:M messages are delivered from Kafka.
>> > > >> >
>> > > >> > What do you use for receiving 1:1 messages ?
>> > > >> >
>> > > >> > Your use case is interesting and different. It will be great if
>> you
>> > > add
>> > > >> > relevant details here -
>> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Neha
>> > > >> >
>> > > >> >
>> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <[email protected]> wrote:
>> > > >> >
>> > > >> >> Hi, Taylor,
>> > > >> >>
>> > > >> >> Thanks for the update. This is great. Could you update your usage
>> > in
>> > > >> Kafka
>> > > >> >> wiki? Also, do you delete topics online? If so, how do you do
>> that?
>> > > >> >>
>> > > >> >> Jun
>> > > >> >>
>> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
>> > [email protected]>
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >>> I've already mentioned this before, but I wanted to give a quick
>> > > >> shout to
>> > > >> >>> let you guys know that our newest game, Deckadence, is 100% live
>> > as
>> > > of
>> > > >> >>> yesterday.
>> > > >> >>>
>> > > >> >>> Check it out at http://www.tagged.com/deckadence.html
>> > > >> >>>
>> > > >> >>> A little about our use case:
>> > > >> >>>
>> > > >> >>>  - Deckadence is a game of buying and selling - or rather
>> trading
>> > -
>> > > >> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses
>> on
>> > > >> >> Tagged,
>> > > >> >>>  so that means there are 100M cards to trade.
>> > > >> >>>  - Kafka enables real-time delivery of events in the game
>> > > >> >>>  - An end user browser makes a long-poll event http connection
>> to
>> > > >> >> receive
>> > > >> >>>  1:1 messages and 1:M messages from a specialized http server we
>> > > built
>> > > >> >> for
>> > > >> >>>  this purpose.  1:M messages are delivered from Kafka.
>> > > >> >>>  - Because of this design, we can publish a message anywhere
>> > inside
>> > > >> our
>> > > >> >>>  datacenter and send it directly and immediately to any other
>> > system
>> > > >> >> that
>> > > >> >>> is
>> > > >> >>>  subscribed to Kafka, or to an end-user browser
>> > > >> >>>  - Every update event for every card is sent to a unique topic
>> > that
>> > > >> >>>  represents the users card.
>> > > >> >>>  - When a user is browsing any card or list of cards - say a
>> > search
>> > > >> >>>  result - their browser subscribes to all of the cards on
>> screen.
>> > > >> >>>  - The effect of this is that any changes to any card seen
>> > on-screen
>> > > >> are
>> > > >> >>>  seen in real-time by all users of the game
>> > > >> >>>  - Our primary producers and consumers are PHP and NodeJS,
>> > > >> respectively
>> > > >> >>>
>> > > >> >>> Well, I plan to write up more about this use case in the near
>> > > future.
>> > > >>  As
>> > > >> >>> you might have guessed, this is just about as far away from the
>> > > >> original
>> > > >> >>> intent of Kafka as you could get - we have PHP that sends
>> messages
>> > > to
>> > > >> >>> Kafka.  Since it's not good to hold a TCP connection open in
>> PHP,
>> > we
>> > > >> had
>> > > >> >> to
>> > > >> >>> do some trickery here.  There was no existing Node client so we
>> > had
>> > > to
>> > > >> >>> write our own.  And since there are 100 million users registered
>> > on
>> > > >> >> Tagged,
>> > > >> >>> that means we could have in theory 100M topics.  Of course in
>> > > >> practice we
>> > > >> >>> have far fewer than that.  One of the main things we currently
>> > have
>> > > >> to do
>> > > >> >>> is aggressively clean topics.  But basically we have many
>> topics,
>> > > few
>> > > >> >>> messages (relatively) per topic.  And order matters, so we had
>> to
>> > > deal
>> > > >> >> with
>> > > >> >>> ensuring that we could handle the number of topics we would
>> > create,
>> > > >> and
>> > > >> >>> ensure ordered delivery and receipt.
>> > > >> >>>
>> > > >> >>> In the future I have big plans for Kafka, another feature is
>> > > >> currently in
>> > > >> >>> private test and will be released to the public soon (it uses
>> > Kafka
>> > > >> in a
>> > > >> >>> more traditional way).  And we hope to have many more in 2012...
>> > > >> >>>
>> > > >> >>
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>

Re: Kafka is live in prod @ 100%

Reply via email to