Re: filter before flush to disk

Jay Kreps Thu, 17 May 2012 08:03:00 -0700

I think there is no inherent reason we couldn't include a
"transformation" plug in that runs before data is written. But after
some bad experiences I am kind of fundamentally against allowing
application code into the infrastructure process. Can you flesh out
the use case a little more with some example? Wouldn't doing a
post-aggregation and re-publication to another topic work just as
well?


-Jay

On Thu, May 17, 2012 at 6:40 AM, S Ahmed <sahmed1...@gmail.com> wrote:
> Oh, maybe this isn't possible again since the object is mapped to a file,
> and it may already have flushed data at the os level?
>
> On Tue, May 15, 2012 at 11:43 AM, S Ahmed <sahmed1...@gmail.com> wrote:
>
>> One downside is if my logic was messed up, I don't have a timeframe of
>> rolling the logic back (which was one of the benefits of kafka's design
>> choice of having messages kept around for x days).
>>
>>
>> On Tue, May 15, 2012 at 11:42 AM, S Ahmed <sahmed1...@gmail.com> wrote:
>>
>>> What do you mean?
>>>
>>> "  I think the direction we are going
>>> is instead to just let you co-locate this processing on the same box.
>>> This gives the isolation of separate processes and the overhead of the
>>> transfer over localhost is pretty minor. "
>>>
>>>
>>> I see what your saying as it is a specific implemention/use case that
>>> diverts from a general purpose mechanism, that's why I was suggesting maybe
>>> a hook/event based system.
>>>
>>>
>>> On Tue, May 15, 2012 at 11:24 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>
>>>> Yeah I see where you are going with that. We toyed with this idea, but
>>>> the idea of coupling processing to the log storage raises a lot of
>>>> problems for general purpose usage. I think the direction we are going
>>>> is instead to just let you co-locate this processing on the same box.
>>>> This gives the isolation of separate processes and the overhead of the
>>>> transfer over localhost is pretty minor.
>>>>
>>>> -Jay
>>>>
>>>> On Tue, May 15, 2012 at 6:38 AM, S Ahmed <sahmed1...@gmail.com> wrote:
>>>> > Would it be possible to filter the collection before it gets flush to
>>>> disk?
>>>> >
>>>> > Say I am tracking page views per user, and I could perform a rollup
>>>> before
>>>> > it gets flushed to disk (using a hashmap with the key being the
>>>> sessionId,
>>>> > and increment a counter for the duplicate entries).
>>>> >
>>>> > And could this be done w/o modifying the original source, maybe through
>>>> > some sort of event/listener?
>>>>
>>>
>>>
>>

Re: filter before flush to disk

Reply via email to