Thanks for initiating this. I think this is a good first step towards
unifying batch and stream processing in Kafka.

I understood this capability to be simple yet very useful; it allows a
Streams program to process a log, in batch, in arbitrary windows defined by
the difference between the HW and the current offset. Basically, it
provides a simple means for a Streams program to "stop" after processing a
batch, stop (just like a batch program would) and continue where it left
off when restarted. In other words, it allows batch processing behavior for
a Streams app without code changes.

This feature is useful but I do not think there is a necessity to add a
metadata topic. After all, the user doesn't really care as much about
exactly where the batch ends. This feature allows an app to "process as
much as there is data to process" and the way it determines how much data
there is to process is by reading the HW on startup. If there is new data
written to the log right after it starts up, it will process it when
restarted the next time. If it starts, reads HW but fails, it will restart
and process a little more before it stops again. The fact that the HW
changes in some scenarios isn't an issue since a batch program that behaves
this way doesn't really care exactly what that HW is.

There might be cases which require adding more topics but I would shy away
from adding complexity wherever possible as it complicates operations and
reduces simplicity.

Other than this issue, I'm +1 on adding this feature. I think it is pretty
powerful.


On Mon, Nov 28, 2016 at 10:48 AM Matthias J. Sax <matth...@confluent.io>
wrote:

> Hi all,
>
> I want to start a discussion about KIP-95:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams
>
> Looking forward to your feedback.
>
>
> -Matthias
>
>
> --
Thanks,
Neha

Reply via email to