At Clearspring we have been using Apache Kafka since early 2011. It powers the AddThis Live View analytics [1] and the update [2] that product recently received involved yet more Kafka (three cheers for the log4j appender!).
The project that we originally started investigating Kafka for is somewhat larger; taking all of the view activity data generated by AddThis sharing tools and replacing pixels on a CDN with direct request to our datacenters. The obvious and exciting benefit is that this gives us access to our data in seconds instead of waiting hours for access log delivery. For that we have two datacenters, each with a web tier pushing to 60 Kafka servers (so 120 in total). Between the two DCs we employ custom bi-directional replication, so that batch and nearline analytics processes have access to a full copy of the data. We are receiving a bit over 3 billion events per day, and expect total events ingested by the system to grow briskly over the next year. One choice that appears somewhat unusual and might be notable is that we are currently exclusively using the low level producer/consumers. Each web server pushes to a local Kafka broker that it is co-located with (we our fans of multi-tenancy where possible and didn't want two different "kinds" of boxes, disk oblivious web services and sequential io oriented kafka were a natural fit), and our consumers are all using Clearspring's analytics system [3] which already had integrated stream consumption and check-pointing. Please let me know if you have any questions. There ought to be some blog posts with more details in the coming weeks. [1] http://www.addthis.com/blog/2011/06/21/social-data-in-real-time-with-addthis-live-view/ [2] http://www.addthis.com/blog/2011/12/20/expanded-addthis-analytics-now-available-in-live-view/ [3] There are a few blog posts and presentations about analytics at Clearspring floating around. This one is the highest level overview: http://www.clearspring.com/blog/2011/05/12/big-data-dc-analytics-at-clearspring/