+1

In my opinion, this limitation is perfectly fine for the MVP. Watermark
alignment is a long-standing issue and this already moves the ball so far
forward.

I don't expect this will cause many issues in practice, as I understand it
the FileSource always processes one split at a time, and in my experience,
90% of Kafka users have a small number of partitions scale their pipelines
to have one reader per partition. Obviously, there are larger-scale Kafka
topics and more sources that will be ported over in the future but I think
there is an implicit understanding that aligning sources adds latency to
pipelines, and we can frame the follow-up "per-split" alignment as an
optimization.

On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <piotr.nowoj...@gmail.com>
wrote:

> Hey!
>
> A couple of weeks ago me and Arvid Heise played around with an idea to
> address a long standing issue of Flink: lack of watermark/event time
> alignment between different parallel instances of sources, that can lead to
> ever growing state size for downstream operators like WindowOperator.
>
> We had an impression that this is relatively low hanging fruit that can be
> quite easily implemented - at least partially (the first part mentioned in
> the FLIP document). I have written down our proposal [1] and you can also
> check out our PoC that we have implemented [2].
>
> We think that this is a quite easy proposal, that has been in large part
> already implemented. There is one obvious limitation of our PoC. Namely we
> can only easily block individual SourceOperators. This works perfectly fine
> as long as there is at most one split per SourceOperator. However it
> doesn't work with multiple splits. In that case, if a single
> `SourceOperator` is responsible for processing both the least and the most
> advanced splits, we won't be able to block this most advanced split for
> generating new records. I'm proposing to solve this problem in the future
> in another follow up FLIP, as a solution that works with a single split per
> operator is easier and already valuable for some of the users.
>
> What do you think about this proposal?
> Best, Piotrek
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> [2] https://github.com/pnowojski/flink/commits/aligned-sources
>

Reply via email to