Thanks! There should be a few Kafka related blog posts from Clearspring coming soon.
On 01/13/2012 03:22 PM, Neha Narkhede wrote: > Congratulations on this Chris ! Wondering if you would be writing a > blog post on your experience of using Kafka for enabling live > analytics ? > > Thanks > Neha > > On Fri, Jan 13, 2012 at 11:46 AM, Chris Burroughs > <chris.burrou...@gmail.com> wrote: >> At Clearspring we have been using Apache Kafka since early 2011. It >> powers the AddThis Live View analytics [1] and the update [2] that >> product recently received involved yet more Kafka (three cheers for the >> log4j appender!). >> >> The project that we originally started investigating Kafka for is >> somewhat larger; taking all of the view activity data generated by >> AddThis sharing tools and replacing pixels on a CDN with direct request >> to our datacenters. The obvious and exciting benefit is that this gives >> us access to our data in seconds instead of waiting hours for access log >> delivery. >> >> For that we have two datacenters, each with a web tier pushing to 60 >> Kafka servers (so 120 in total). Between the two DCs we employ custom >> bi-directional replication, so that batch and nearline analytics >> processes have access to a full copy of the data. We are receiving a >> bit over 3 billion events per day, and expect total events ingested by >> the system to grow briskly over the next year. >> >> One choice that appears somewhat unusual and might be notable is that >> we are currently exclusively using the low level producer/consumers. >> Each web server pushes to a local Kafka broker that it is co-located >> with (we our fans of multi-tenancy where possible and didn't want two >> different "kinds" of boxes, disk oblivious web services and sequential >> io oriented kafka were a natural fit), and our consumers are all using >> Clearspring's analytics system [3] which already had >> integrated stream consumption and check-pointing. >> >> Please let me know if you have any questions. There ought to be some >> blog posts with more details in the coming weeks. >> >> [1] >> http://www.addthis.com/blog/2011/06/21/social-data-in-real-time-with-addthis-live-view/ >> >> [2] >> http://www.addthis.com/blog/2011/12/20/expanded-addthis-analytics-now-available-in-live-view/ >> >> [3] There are a few blog posts and presentations about analytics at >> Clearspring floating around. This one is the highest level overview: >> http://www.clearspring.com/blog/2011/05/12/big-data-dc-analytics-at-clearspring/