Re: filter before flush to disk

S Ahmed Tue, 15 May 2012 08:44:22 -0700

One downside is if my logic was messed up, I don't have a timeframe of
rolling the logic back (which was one of the benefits of kafka's design
choice of having messages kept around for x days).


On Tue, May 15, 2012 at 11:42 AM, S Ahmed <sahmed1...@gmail.com> wrote:

> What do you mean?
>
> "  I think the direction we are going
> is instead to just let you co-locate this processing on the same box.
> This gives the isolation of separate processes and the overhead of the
> transfer over localhost is pretty minor. "
>
>
> I see what your saying as it is a specific implemention/use case that
> diverts from a general purpose mechanism, that's why I was suggesting maybe
> a hook/event based system.
>
>
> On Tue, May 15, 2012 at 11:24 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
>> Yeah I see where you are going with that. We toyed with this idea, but
>> the idea of coupling processing to the log storage raises a lot of
>> problems for general purpose usage. I think the direction we are going
>> is instead to just let you co-locate this processing on the same box.
>> This gives the isolation of separate processes and the overhead of the
>> transfer over localhost is pretty minor.
>>
>> -Jay
>>
>> On Tue, May 15, 2012 at 6:38 AM, S Ahmed <sahmed1...@gmail.com> wrote:
>> > Would it be possible to filter the collection before it gets flush to
>> disk?
>> >
>> > Say I am tracking page views per user, and I could perform a rollup
>> before
>> > it gets flushed to disk (using a hashmap with the key being the
>> sessionId,
>> > and increment a counter for the duplicate entries).
>> >
>> > And could this be done w/o modifying the original source, maybe through
>> > some sort of event/listener?
>>
>
>

Re: filter before flush to disk

Reply via email to