Mark, That's a good suggestion. I created a page off Kafka wiki ( https://cwiki.apache.org/confluence/display/KAFKA/Index) and we can use it to document all production use cases of Kafka. We will be adding LinkedIn usage there soon.
Jun On Fri, Nov 4, 2011 at 9:27 AM, Mark <static.void....@gmail.com> wrote: > I am struggling on some core design concepts and I was hoping someone > could explaining how they use Kafka in their production event for event > processing. For example, I've read that LinkedIn has over 60+ metrics they > collect and aggregate.. ie page views, clicks etc. I clearly grasp the > concept of logging a page view event to Kafka, but I'm missing the last > part. How does one go about aggregating this data and using it any other > way than a simple data sink. > > Taking the "page_view" example further. What is the preferred way of > logging and consuming this event? Would you have a consumer that just > consumes page views? If so, how do you go about making sure you dont > reconsume the same message in the event of a conusmer restart? Also for > analytical/reporting needs how do you deal with timeframes? Say my consumer > is subscribe to the "page_view" topic and I want all messages from 8am-9am. > Would I read all messages and filter out any that doesn't have a specific > timestamp, or would I create very a seperate topic for each hour.. ie > "page_view/08:00". Same question applies to importing all "page_views" for > yesterday into Hadoop. > > I know Kafka is a new project and im sure everyones time is constrained > but I think it would be helpful if some high level examples/use cases and > best practices were added to the wiki. This could help gain adoption and > hopeful bring in a more willing contributors :) > > Thanks for your help > > > >