Wanted to see if I can resurrect this thread. I'm looking for anyone who's running Kafka on AWS. And S3 consumer for Kafka is particularly interesting.
any help would be truly appreciated. >>it does not require your Hadoop cluster to be permanent. Like any MR job >>that outputs to S3, once the data is in S3 it is there for good (unless you >>explicitly delete it). On Sun, Aug 19, 2012 at 10:10 AM, Russell Jurney <russell.jur...@gmail.com>wrote: > Thanks, I'll check this out. > > On Sun, Aug 19, 2012 at 7:09 AM, Russell Jurney <russell.jur...@gmail.com > >wrote: > > > Thanks for your response, and glad to hear you need this as well and are > > working on it. > > > > Does using s3n:// file-path require that you have a Hadoop cluster > > running? I use S3 and EMR, so my Hadoop clusters are temporary. I do use > > Hadoop with S3 to consume the data Kafka produces, so I am fine with > Hadoop > > as a dependency - at the library level, but not if a cluster must persist > > for the Kafka S3 consumer to work. > > > > > > On Sat, Aug 18, 2012 at 9:20 AM, Matthew Rathbone < > matt...@foursquare.com>wrote: > > > >> Hey Russell, > >> > >> We're actually about to start work on this exact thing here at > foursquare > >> as we're about to start prototyping kafka to replace our aging log > >> infrastructure. > >> > >> We'd planned on just using the hadoop-consumer, but setting the output > >> directory to a S3n:// file-path. > >> > >> I'm assuming that you want to build a consumer that operates outside of > >> hadoop? > >> > >> > >> > >> On Sat, Aug 18, 2012 at 12:49 AM, Russell Jurney > >> <russell.jur...@gmail.com>wrote: > >> > >> > Ok, this is the last time I'm gonna beg for an S3 sink for Kafka. I'm > >> > not trolling, and this is Your Big Chance to help! > >> > > >> > I'm gonna blog about using Whirr to boot Zookeeper and then to boot > >> > Kafka in the cloud and then create events in an application that get > >> > sunk to Amazon S3, where they will be processed by > >> > Pig/Hadoop/ElasticMapReduce, mined into gems and republished in some > >> > esoteric NoSQL DB and then served in the very app that generated the > >> > events in the first place. > >> > > >> > So, if someone else doesn't contribute an S3 consumer for Kafka in the > >> > next month or so... so help me Bob, I'm gonna write it myself. Now, > >> > some of you may not know me, but I am the 3rd best software engineer > >> > in the world: > >> > > http://www.quora.com/Who-are-some-of-the-best-software-engineers-alive > >> > > >> > Those of you that have seen my code, however, are aware that as a > >> > programmer, I am substandard. There's a gene that imparts exception > >> > handling and algorithms, and they're missing from my genome. > >> > > >> > So let me be clear: you don't want me to write the S3 sink. A Kafka > >> > committer or someone with a real job should write the S3 sink. As soon > >> > as that thing is written and my blog post goes out, Kafka use will > >> > spike and you'll all be famous. > >> > > >> > So this is a direct threat: I am writing an S3 consumer for Kafka > >> > unless one of you steps up. And you will rue the day that piece of > >> > crap ships. > >> > > >> > In return for your contribution, you will be named in my blog post as > >> > open source citizen of the month, to be accompanied by a commemorative > >> > plaque with a pixelated photo of me. > >> > > >> > Yours truly, > >> > > >> > Russell Jurney http://datasyndrome.com > >> > > >> > >> > >> > >> -- > >> Matthew Rathbone > >> Foursquare | Software Engineer | Server Engineering Team > >> matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | > >> 4sq<http://foursquare.com/rathboma> > >> > > > > > > > > -- > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.comdatasyndrome. > > com > > > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > datasyndrome.com > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>