Re: Kafka is live in prod @ 100%

Taylor Gautier Fri, 17 Feb 2012 09:02:33 -0800

Hi Thai.

Well, actually we didn't solve this problem.  We had to use the global
topic settings that apply to all topics.


I would really like to see globs (wildcards) supported in the config
settings.  This is something my team and I have discussed on several
occasions.

I'm not sure if there is a Kafka JIRA to cover that feature…

-Taylor

On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <[email protected]> wrote:

> Hi Taylor,
>
> I found your email and the Kafka use case by chance. Our use case is a
> little similar to yours. We actually implement semantic partitioning to
> maintain some kind of produced data and we are also running several
> thousand topics as you.
>
> One issue we have been facing is that it is totally inconvenient for us to
> maintain and update Kafka server configuration (server.properties) when
> running several thousand topics. We have to put number of partitions on a
> per-topic in the way Kafka requires:
>
> ### Overrides for for the default given by num.partitions on a per-topic
> basis
> topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
>
> I am almost sure that you did meet this issue I have mentioned, so I am
> curious to know how you solved it.
>
> Thanks,
> ~Thai
>
> On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <[email protected]>wrote:
>
>> We had to isolate topics to specific servers because we are running
>> several hundred thousand topics in aggregate.
>>
>> Due to the directory strategy of Kafka it's not feasible to put that
>> many topics in every host since they reside in a single directory.
>>
>> An improvement we considered making was to make the data directory
>> nested which would have alleviated this problem.  We also could have
>> tried a different filesystem but we weren't confident that would solve
>> the problem entirely.
>>
>> The advantage to our solution is that each host in our Kafka tier is
>> literally share nothing. It will scale horizontally for a long, long
>> way.
>>
>> And it's also a contingency plan. Since Kafka was unproven (for us
>> anyway at the time) it was easier to build smaller components with
>> less overall functionality and glue them together in a scalable way.
>> If we had had to we could have out a different message bus in place.
>> But we didn't want to do that if we could avoid it :)
>>
>>
>>
>> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <[email protected]>
>> wrote:
>>
>> > Taylor,
>> >
>> > This sounds great ! Congratulations on this launch.
>> >
>> >>> But basically we have many topics, few messages (relatively) per topic
>> >
>> > Can you explain your strategy of mapping topics to brokers ? The
>> default in
>> > Kafka today is to have all brokers host all topics.
>> >
>> >>> An end user browser makes a long-poll event http connection to receive
>> >  1:1 messages and 1:M messages from a specialized http server we built
>> for
>> >  this purpose.  1:M messages are delivered from Kafka.
>> >
>> > What do you use for receiving 1:1 messages ?
>> >
>> > Your use case is interesting and different. It will be great if you add
>> > relevant details here -
>> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
>> >
>> > Thanks,
>> > Neha
>> >
>> >
>> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <[email protected]> wrote:
>> >
>> >> Hi, Taylor,
>> >>
>> >> Thanks for the update. This is great. Could you update your usage in
>> Kafka
>> >> wiki? Also, do you delete topics online? If so, how do you do that?
>> >>
>> >> Jun
>> >>
>> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <[email protected]>
>> >> wrote:
>> >>
>> >>> I've already mentioned this before, but I wanted to give a quick
>> shout to
>> >>> let you guys know that our newest game, Deckadence, is 100% live as of
>> >>> yesterday.
>> >>>
>> >>> Check it out at http://www.tagged.com/deckadence.html
>> >>>
>> >>> A little about our use case:
>> >>>
>> >>>  - Deckadence is a game of buying and selling - or rather trading -
>> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses on
>> >> Tagged,
>> >>>  so that means there are 100M cards to trade.
>> >>>  - Kafka enables real-time delivery of events in the game
>> >>>  - An end user browser makes a long-poll event http connection to
>> >> receive
>> >>>  1:1 messages and 1:M messages from a specialized http server we built
>> >> for
>> >>>  this purpose.  1:M messages are delivered from Kafka.
>> >>>  - Because of this design, we can publish a message anywhere inside
>> our
>> >>>  datacenter and send it directly and immediately to any other system
>> >> that
>> >>> is
>> >>>  subscribed to Kafka, or to an end-user browser
>> >>>  - Every update event for every card is sent to a unique topic that
>> >>>  represents the users card.
>> >>>  - When a user is browsing any card or list of cards - say a search
>> >>>  result - their browser subscribes to all of the cards on screen.
>> >>>  - The effect of this is that any changes to any card seen on-screen
>> are
>> >>>  seen in real-time by all users of the game
>> >>>  - Our primary producers and consumers are PHP and NodeJS,
>> respectively
>> >>>
>> >>> Well, I plan to write up more about this use case in the near future.
>>  As
>> >>> you might have guessed, this is just about as far away from the
>> original
>> >>> intent of Kafka as you could get - we have PHP that sends messages to
>> >>> Kafka.  Since it's not good to hold a TCP connection open in PHP, we
>> had
>> >> to
>> >>> do some trickery here.  There was no existing Node client so we had to
>> >>> write our own.  And since there are 100 million users registered on
>> >> Tagged,
>> >>> that means we could have in theory 100M topics.  Of course in
>> practice we
>> >>> have far fewer than that.  One of the main things we currently have
>> to do
>> >>> is aggressively clean topics.  But basically we have many topics, few
>> >>> messages (relatively) per topic.  And order matters, so we had to deal
>> >> with
>> >>> ensuring that we could handle the number of topics we would create,
>> and
>> >>> ensure ordered delivery and receipt.
>> >>>
>> >>> In the future I have big plans for Kafka, another feature is
>> currently in
>> >>> private test and will be released to the public soon (it uses Kafka
>> in a
>> >>> more traditional way).  And we hope to have many more in 2012...
>> >>>
>> >>
>>
>
>

Re: Kafka is live in prod @ 100%

Reply via email to