Vaibhav, Thanks for explaining your use case. I think I see the requirement here. It seems like you need the data in S3 since you use Elastic MapReduce to process your data. I guess that's the reason the Hadoop input/output formats that Kafka provides are not directly useful.
I have some ideas on how this can be done. Will write them up on a wiki soon. Thanks, Neha On Tue, Mar 20, 2012 at 10:21 PM, Vaibhav Puranik <vpura...@gmail.com> wrote: > Neha, > > My requirement is not related to Russell's, but I thought it will be > helpful describe what we need at GumGum <http://gumgum.com/>. > I wasn't sure whether it's Kafka domain since kafka gives you a topic > to pull data from and then it's up to you to do whatever with it. > > But since we are talking about it, here is what we do everyday (currently > without Kafka): > > We are a ad network. We write all of our impressions and clicks data in > various log files and upload it to S3. At night we run many Map reduce jobs > to aggregate this data in various ways. > We have an 'Autoscaled' cluster in AWS. Our webservers keep going up and > down based on the load on the system. > > Whenever a server shuts down we tend to lose data. Many times file upload > is not completed in time before the server shuts down. That is why we are > looking at implementing Kafka to send events in real time to S3 without > losing them. > > If there exists a 'sink' that transfers data to S3, our job will be lot > easier. But again, I am not sure whether Kafka is supposed to provide that > or not. > > Regards, > Vaibhav > > > On Tue, Mar 20, 2012 at 10:03 PM, Neha Narkhede > <neha.narkh...@gmail.com>wrote: > >> Russell, >> >> By "sink events into S3", do you mean you want to have some plugin that >> will suck data out of your Kafka brokers and upload to S3. Would you mind >> describing use cases that would require to send data to Kafka, then upload >> data to S3, and then use it by querying S3 ? >> >> Thanks, >> Neha >> On Mar 20, 2012 4:51 PM, "Russell Jurney" <russell.jur...@gmail.com> >> wrote: >> >> > I think as soon as someone commits code that reliably sinks events to S3, >> > Kafka adoption will skyrocket. There is no good solution to this yet. >> > MANY people want one. >> > >> > Russ >> > >> > On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote: >> > >> > > The primary use case for Kafka is to use it on AWS...??? >> > > >> > > Sorry if I put words you didn't intend in your mouth :P ... I just >> > thought >> > > that sounded funny ;) >> > > >> > > Sorry for being off-topic. Carry on :/ ! >> > > >> > > -- >> > > Felix >> > > >> > > >> > > >> > > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney < >> > russell.jur...@gmail.com >> > > >wrote: >> > > >> > > > Yeah, that is the part I am hoping someone will contribute :) I >> know I >> > > can >> > > > write that myself. I also know it will be buggy and that I will have >> > > lots >> > > > of trouble. >> > > > >> > > > If you contribute this code, it would be a huge boon to Kafka. It is >> > imo >> > > > the primary use case for Kafka atm... if only the code gets into git. >> > > > >> > > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders < >> niek.sand...@gmail.com >> > > > >wrote: >> > > > >> > > > > Russell, >> > > > > >> > > > > I'm actually in the process of writing a Java code to go from Kafka >> > > > > messages to S3. I might be able to rip-out my application-specific >> > > > > parts and share something later tonight. >> > > > > >> > > > > The biggest hassle is that you can't append to existing S3 files. >> So >> > > > > unless you're planning on uploading each message as a separate S3 >> > > > > object, this means you need message aggregation smarts on the Kafka >> > > > > consumer / S3 uploader side of things. >> > > > > >> > > > > Best, >> > > > > Niek >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney >> > > > > <russell.jur...@gmail.com> wrote: >> > > > > > I wish someone would publish some source that writes events to >> S3. >> > > > > > >> > > > > > Russell Jurney >> > > > > > twitter.com/rjurney >> > > > > > russell.jur...@gmail.com >> > > > > > datasyndrome.com >> > > > > > >> > > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <dfay...@gmail.com> >> > wrote: >> > > > > > >> > > > > >> We've been successfully using Kafka on AWS as well, and JMX wise >> > we >> > > > > >> just use an SSH tunnel. >> > > > > >> >> > > > > >> In general, we've been very happy with the performance on AWS, >> > which >> > > > > >> some people have reservations about due to the I/O situation on >> > most >> > > > > >> Amazon boxes. >> > > > > >> >> > > > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju >> > > > > >> <gautam.singar...@gmail.com> wrote: >> > > > > >>> We are have been considering Kafka for a new Data Platform. Has >> > > > someone >> > > > > >>> used Kafka in AWS? If so, could you please share your >> experiences >> > > > with >> > > > > us? >> > > > > >>> >> > > > > >>> Thank you! >> > > > > >>> --- >> > > > > >>> Gautam >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> -- >> > > > > >> -- >> > > > > >> Dave Fayram >> > > > > >> dfay...@gmail.com >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >> > > > datasyndrome.com >> > > > >> > > >> > >> > >> > >> > -- >> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >> > datasyndrome.com >> > >>