Re: ‏‏RE: Architecture Consulting

Tim Lossen Tue, 19 Jun 2012 15:23:12 -0700

no, we do not preprocess and republish the events, although we
have toyed with the idea. currently, all our consumers do their own
preprocessing (ip lookup etc.).


cheers
tim


On 2012-06-19, at 10:09 PM, Guy Doulberg wrote:

Hןi Tom,
Thanks for you replay,

Do in your implementation you have enrichment process?
If so, how do you perform the enrichment on each of the topics?


Thanks, Guy

________________________________________
‏‏מאת: Tim Lossen [t...@lossen.de]
‏‏נשלח: ‏‏יום שלישי 19 יוני 2012 18:53
‏‏אל: kafka-users@incubator.apache.org
‏‏נושא: Re: Architecture Consulting

well, we decided to go with one topic per game (approach 2),
as there are some consumers which are only interested in data
from a single topic. makes it a bit harder for consumers interested
in processing ALL events though.

not knowing more about your concrete situation, it is difficult
to decide what is better in your case.

cheers
tim


On 2012-06-19, at 15:12 , Guy Doulberg wrote:
Hi all,

We'd like to consult with you about our Kafka architecture,
We have Http endpoints that receive events from the web, and pushthem into the system via kafka. The events are distinguishable bytheir HTTP url, and are sharded to their corresponding topics.
We have 2 designs in mind:

1. One main 'raw' topic, split to multiple enriched topics.
The endpoints write to one kafka topic, lets call it 'Raw topic'.
The above 'raw topic' is consumed by some kafka consumer which doesthe following:i - enrich the data (extract ip-to-location info, standardizebrowser/os type, etc)ii -feed the enriched data to a new topic, based on the referrerinformation.
2. Multiple 'raw' topics each fed to its corresponding 'enriched'topic.Have the web endpoints shard the events based on their referrer,creating multiple 'raw' topics, one per referrer type/domain.Each 'raw' topic is then consumed, and a corresponding enrichedstream/topic is created from it.
The dilemma is weather to do the separation to topics as soon aswe can, at the web endpoint (option 2)
or to postpone it as much as possible (option 1).....
I prefer option 1 , but tests I ran, reveaI that in a scenariowhere there are many event types in the same topic, and some eventtypes have many more occurrences than others, the more frequentevent types seem to "drown" the less common ones, which roughlytranslates to the fact that less common events may appear at theirconsumer side much later in time than the more frequent ones.If my system requires a 'timely' processing of events, thisbehaviour poses a problem.
What do you think? thanks
--
http://tim.lossen.de


--
http://tim.lossen.de

Re: ‏‏RE: Architecture Consulting

Reply via email to