Congratulations on this Chris ! Wondering if you would be writing a blog post on your experience of using Kafka for enabling live analytics ?
Thanks Neha On Fri, Jan 13, 2012 at 11:46 AM, Chris Burroughs <chris.burrou...@gmail.com> wrote: > At Clearspring we have been using Apache Kafka since early 2011. It > powers the AddThis Live View analytics [1] and the update [2] that > product recently received involved yet more Kafka (three cheers for the > log4j appender!). > > The project that we originally started investigating Kafka for is > somewhat larger; taking all of the view activity data generated by > AddThis sharing tools and replacing pixels on a CDN with direct request > to our datacenters. The obvious and exciting benefit is that this gives > us access to our data in seconds instead of waiting hours for access log > delivery. > > For that we have two datacenters, each with a web tier pushing to 60 > Kafka servers (so 120 in total). Between the two DCs we employ custom > bi-directional replication, so that batch and nearline analytics > processes have access to a full copy of the data. We are receiving a > bit over 3 billion events per day, and expect total events ingested by > the system to grow briskly over the next year. > > One choice that appears somewhat unusual and might be notable is that > we are currently exclusively using the low level producer/consumers. > Each web server pushes to a local Kafka broker that it is co-located > with (we our fans of multi-tenancy where possible and didn't want two > different "kinds" of boxes, disk oblivious web services and sequential > io oriented kafka were a natural fit), and our consumers are all using > Clearspring's analytics system [3] which already had > integrated stream consumption and check-pointing. > > Please let me know if you have any questions. There ought to be some > blog posts with more details in the coming weeks. > > [1] > http://www.addthis.com/blog/2011/06/21/social-data-in-real-time-with-addthis-live-view/ > > [2] > http://www.addthis.com/blog/2011/12/20/expanded-addthis-analytics-now-available-in-live-view/ > > [3] There are a few blog posts and presentations about analytics at > Clearspring floating around. This one is the highest level overview: > http://www.clearspring.com/blog/2011/05/12/big-data-dc-analytics-at-clearspring/