Basically it would just be a consumer that wrote to S3. Kafka handles
scaling the consumption while making sure each consumer gets a subset of
data. Probably we could make some command line tool. You would need some
way to let the user control the format of the S3 data in a pluggable
fashion. It could be a contrib package, or even just a separate github
mini-project since it just works off the public api and would really just
be used by people who want to get stuff into S3.

-Jay

On Wed, May 23, 2012 at 8:21 AM, S Ahmed <sahmed1...@gmail.com> wrote:

> What would be needed to do this?
>
> Just thinking off the top of my head:
>
> 1. create a zookeeper store to keep track of the last message offset
> persisted to s3, and which messages each consumer is processing.
>
> 2. pull messages off and group in whatever grouping you want (per message,
> 10 messages, etc.), and spin off a executorservice to push to s3, update
> the zookeeper offset.
>
> I'm new to kafka, but I would have to investigate on how multiple consumers
> can pull messages and push to s3, while not having the consumers pull the
> same messages.
> Setting up a zookeeper store to track progress specifically for what has
> been pushed to s3.
>
>
> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <russell.jur...@gmail.com
> >wrote:
>
> > Yeah, no kidding. I keep waiting on one :)
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On May 22, 2012, at 10:31 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > No. Patches accepted.
> > >
> > > -Jay
> > >
> > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> > > <russell.jur...@gmail.com>wrote:
> > >
> > >> Is there a simple way to dump Kafka events to S3 yet?
> > >>
> > >> Russell Jurney http://datasyndrome.com
> > >>
> >
>

Reply via email to