Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Ewen Cheslack-Postava Thu, 23 Jul 2015 22:26:19 -0700

Just some notes on the KIP doc itself:

* It'd be useful to clarify at what point the plain consumer + custom code
+ producer breaks down. I think trivial filtering and aggregation on a
single stream usually work fine with this model. Anything where you need
more complex joins, windowing, etc. are where it breaks down. I think most
interesting applications require that functionality, but it's helpful to
make this really clear in the motivation -- right now, Kafka only provides
the lowest level plumbing for stream processing applications, so most
interesting apps require very heavyweight frameworks.
* I think the feature comparison of plain producer/consumer, stream
processing frameworks, and this new library is a good start, but we might
want something more thorough and structured, like a feature matrix. Right
now it's hard to figure out exactly how they relate to each other.
* I'd personally push the library vs. framework story very strongly -- the
total buy-in and weak integration story of stream processing frameworks is
a big downside and makes a library a really compelling (and currently
unavailable, as far as I am aware) alternative.
* Comment about in-memory storage of other frameworks is interesting -- it
is specific to the framework, but is supposed to also give performance
benefits. The high-level functional processing interface would allow for
combining multiple operations when there's no shuffle, but when there is a
shuffle, we'll always be writing to Kafka, right? Spark (and presumably
spark streaming) is supposed to get a big win by handling shuffles such
that the data just stays in cache and never actually hits disk, or at least
hits disk in the background. Will we take a hit because we always write to
Kafka?
* I really struggled with the structure of the KIP template with Copycat
because the flow doesn't work well for proposals like this. They aren't as
concrete changes as the KIP template was designed for. I'd completely
ignore that template in favor of optimizing for clarity if I were you.


-Ewen

On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi all,
>
> I just posted KIP-28: Add a transform client for data processing
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
> >
> .
>
> The wiki page does not yet have the full design / implementation details,
> and this email is to kick-off the conversation on whether we should add
> this new client with the described motivations, and if yes what features /
> functionalities should be included.
>
> Looking forward to your feedback!
>
> -- Guozhang
>



-- 
Thanks,
Ewen

Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Reply via email to