+1 In my opinion, this limitation is perfectly fine for the MVP. Watermark alignment is a long-standing issue and this already moves the ball so far forward.
I don't expect this will cause many issues in practice, as I understand it the FileSource always processes one split at a time, and in my experience, 90% of Kafka users have a small number of partitions scale their pipelines to have one reader per partition. Obviously, there are larger-scale Kafka topics and more sources that will be ported over in the future but I think there is an implicit understanding that aligning sources adds latency to pipelines, and we can frame the follow-up "per-split" alignment as an optimization. On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski <piotr.nowoj...@gmail.com> wrote: > Hey! > > A couple of weeks ago me and Arvid Heise played around with an idea to > address a long standing issue of Flink: lack of watermark/event time > alignment between different parallel instances of sources, that can lead to > ever growing state size for downstream operators like WindowOperator. > > We had an impression that this is relatively low hanging fruit that can be > quite easily implemented - at least partially (the first part mentioned in > the FLIP document). I have written down our proposal [1] and you can also > check out our PoC that we have implemented [2]. > > We think that this is a quite easy proposal, that has been in large part > already implemented. There is one obvious limitation of our PoC. Namely we > can only easily block individual SourceOperators. This works perfectly fine > as long as there is at most one split per SourceOperator. However it > doesn't work with multiple splits. In that case, if a single > `SourceOperator` is responsible for processing both the least and the most > advanced splits, we won't be able to block this most advanced split for > generating new records. I'm proposing to solve this problem in the future > in another follow up FLIP, as a solution that works with a single split per > operator is easier and already valuable for some of the users. > > What do you think about this proposal? > Best, Piotrek > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources > [2] https://github.com/pnowojski/flink/commits/aligned-sources >