One downside is if my logic was messed up, I don't have a timeframe of rolling the logic back (which was one of the benefits of kafka's design choice of having messages kept around for x days).
On Tue, May 15, 2012 at 11:42 AM, S Ahmed <sahmed1...@gmail.com> wrote: > What do you mean? > > " I think the direction we are going > is instead to just let you co-locate this processing on the same box. > This gives the isolation of separate processes and the overhead of the > transfer over localhost is pretty minor. " > > > I see what your saying as it is a specific implemention/use case that > diverts from a general purpose mechanism, that's why I was suggesting maybe > a hook/event based system. > > > On Tue, May 15, 2012 at 11:24 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Yeah I see where you are going with that. We toyed with this idea, but >> the idea of coupling processing to the log storage raises a lot of >> problems for general purpose usage. I think the direction we are going >> is instead to just let you co-locate this processing on the same box. >> This gives the isolation of separate processes and the overhead of the >> transfer over localhost is pretty minor. >> >> -Jay >> >> On Tue, May 15, 2012 at 6:38 AM, S Ahmed <sahmed1...@gmail.com> wrote: >> > Would it be possible to filter the collection before it gets flush to >> disk? >> > >> > Say I am tracking page views per user, and I could perform a rollup >> before >> > it gets flushed to disk (using a hashmap with the key being the >> sessionId, >> > and increment a counter for the duplicate entries). >> > >> > And could this be done w/o modifying the original source, maybe through >> > some sort of event/listener? >> > >