Hi Thomas and Akshay, Sorry for the late reply. Due to problems with other FLIPs, we had to scrap our plans to publish FLIP-182 in Flink 1.14. However as far as I know, Arvid is working on this right now, and we are tentatively aiming with this feature for the 1.15.0 release. I hope Arvid will be able to report something back in the next couple of weeks.
Best, Piotrek pon., 11 paź 2021 o 19:13 akshay padmanabhan <akshayiyan...@gmail.com> napisał(a): > > Thanks, Piotr and Arvid. Like Thomas even I'm interested in this feature > and was wondering if I can also contribute in some means in this effort. > > Thanks > Akshay > On 2021/09/08 15:44:01, Thomas Weise <t...@apache.org> wrote: > > Thank you Piotr and Arvid for the context. > > > > I'm interested in helping with this feature. If I'm able to contribute > > before October, then I will reach out. > > > > Thanks, > > Thomas > > > > > > On Tue, Sep 7, 2021 at 11:33 PM Arvid Heise <ar...@apache.org> wrote: > > > > > Just to clarify: I specifically asked Piotr to not persue the FLIP if > the > > > current state wouldn't make it in 1.14, such that someone else can > take it > > > over and expand it towards per-split alignment. Having a minimalistic > > > version in 1.14 + amendment FLIP in 1.15 would have been fine but now I > > > rather want to have it completely done in one go. > > > > > > I expect to work on it in October, so feel free to go ahead if you can > make > > > it sooner. > > > > > > Best, > > > > > > Arvid > > > On Tue, Sep 7, 2021 at 8:41 PM Piotr Nowojski <pnowoj...@apache.org> > > > wrote: > > > > > > > Hi Thomas, > > > > > > > > Unfortunately me/Arvid didn't have enough time to finish this off for > > > > 1.14.0 as we were firefighting other efforts and we have re-focused > on > > > > other more advanced FLIPs. We want to deliver it for 1.15 though. > I'm not > > > > sure, but I remember Arvid saying something that he would like to > > > actually > > > > take a look at this in 1.15 cycle with per-split throttling. If not, > at > > > the > > > > very least I would like to contribute the version without the > per-split > > > > logic, as this is almost done. > > > > > > > > Piotrek > > > > > > > > wt., 7 wrz 2021 o 19:18 Thomas Weise <t...@apache.org> napisał(a): > > > > > > > > > Hi, > > > > > > > > > > I wanted to check if there is active development on FLIP-182 and > what > > > the > > > > > target release for it might be? [1] still shows as under > discussion. > > > > > > > > > > Regarding the per-subtask vs. per-split limitation: I think it > will be > > > > > important that this eventually works per split, since in some > cases it > > > > > won't be practical to limit a subtask to a single split (think > > > > KafkaSource > > > > > reading from many topics with diverse volumes). > > > > > > > > > > Thanks, > > > > > Thomas > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources > > > > > > > > > > On Wed, Jul 21, 2021 at 4:48 AM Piotr Nowojski < > pnowoj...@apache.org> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > I would not fully advertise this before we have the second > part > > > > > > implemented as well. > > > > > > > > > > > > I'm not sure, maybe we could advertise with a big warning about > this > > > > > > limitation. I mean it's not as if this change would break > something. > > > At > > > > > > worst it just wouldn't fully solve the problem with multiple > splits > > > per > > > > > > single operator, but only limit the scope of that problem. At the > > > same > > > > > time > > > > > > I don't have strong feelings about this. If the consensus would > be to > > > > not > > > > > > advertise it, I'm also fine with it. Only in that case we should > > > > probably > > > > > > quickly follow up with the per split solution. > > > > > > > > > > > > Anyway, thanks for voicing your support and the discussions. I'm > > > going > > > > to > > > > > > start a voting thread for this feature soon. > > > > > > > > > > > > Best, > > > > > > Piotrek > > > > > > > > > > > > wt., 13 lip 2021 o 19:09 Stephan Ewen <se...@apache.org> > napisał(a): > > > > > > > > > > > > > @Eron Wright <eronwri...@gmail.com> The per-split watermarks > are > > > > the > > > > > > > default in the new source interface (FLIP-27) and come for > free if > > > > you > > > > > > use > > > > > > > the SplitReader. > > > > > > > > > > > > > > Based on that, it is also possible to unsubscribe individual > splits > > > > to > > > > > > > solve the alignment in the case where operators have multiple > > > splits > > > > > > > assigned. > > > > > > > Piotr and I already discussed that, but concluded that the > > > > > implementation > > > > > > > of that is largely orthogonal. > > > > > > > > > > > > > > I am a bit worried, though, that if we release and advertise > the > > > > > > alignment > > > > > > > without handling this case, we create a surprise for quite a > few > > > > users. > > > > > > > While this is admittedly valuable for some users, I think we > need > > > to > > > > > > > position this accordingly. I would not fully advertise this > before > > > we > > > > > > have > > > > > > > the second part implemented as well. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 12, 2021 at 7:18 PM Eron Wright < > > > ewri...@streamnative.io > > > > > > > .invalid> > > > > > > > wrote: > > > > > > > > > > > > > > > The notion of per-split watermarks seems quite interesting. > I > > > > think > > > > > > the > > > > > > > > idleness feature could benefit from a per-split approach too, > > > > because > > > > > > > > idleness is typically related to whether any splits are > assigned > > > > to a > > > > > > > given > > > > > > > > operator instance. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 12, 2021 at 3:06 AM 刘建刚 < > liujiangangp...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > +1 for the source watermark alignment. > > > > > > > > > In the previous flink version, the source connectors are > > > > different > > > > > in > > > > > > > > > implementation and it is hard to make this feature. When > the > > > > > consumed > > > > > > > > data > > > > > > > > > is not aligned or consuming history data, it is very easy > to > > > > cause > > > > > > the > > > > > > > > > unalignment. Source alignment can resolve many unstable > > > problems. > > > > > > > > > > > > > > > > > > Seth Wiesman <sjwies...@gmail.com> 于2021年7月9日周五 下午11:25写道: > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > In my opinion, this limitation is perfectly fine for the > MVP. > > > > > > > Watermark > > > > > > > > > > alignment is a long-standing issue and this already > moves the > > > > > ball > > > > > > so > > > > > > > > far > > > > > > > > > > forward. > > > > > > > > > > > > > > > > > > > > I don't expect this will cause many issues in practice, > as I > > > > > > > understand > > > > > > > > > it > > > > > > > > > > the FileSource always processes one split at a time, and > in > > > my > > > > > > > > > experience, > > > > > > > > > > 90% of Kafka users have a small number of partitions > scale > > > > their > > > > > > > > > pipelines > > > > > > > > > > to have one reader per partition. Obviously, there are > > > > > larger-scale > > > > > > > > Kafka > > > > > > > > > > topics and more sources that will be ported over in the > > > future > > > > > but > > > > > > I > > > > > > > > > think > > > > > > > > > > there is an implicit understanding that aligning sources > adds > > > > > > latency > > > > > > > > to > > > > > > > > > > pipelines, and we can frame the follow-up "per-split" > > > alignment > > > > > as > > > > > > an > > > > > > > > > > optimization. > > > > > > > > > > > > > > > > > > > > On Fri, Jul 9, 2021 at 6:40 AM Piotr Nowojski < > > > > > > > > piotr.nowoj...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hey! > > > > > > > > > > > > > > > > > > > > > > A couple of weeks ago me and Arvid Heise played around > with > > > > an > > > > > > idea > > > > > > > > to > > > > > > > > > > > address a long standing issue of Flink: lack of > > > > watermark/event > > > > > > > time > > > > > > > > > > > alignment between different parallel instances of > sources, > > > > that > > > > > > can > > > > > > > > > lead > > > > > > > > > > to > > > > > > > > > > > ever growing state size for downstream operators like > > > > > > > WindowOperator. > > > > > > > > > > > > > > > > > > > > > > We had an impression that this is relatively low > hanging > > > > fruit > > > > > > that > > > > > > > > can > > > > > > > > > > be > > > > > > > > > > > quite easily implemented - at least partially (the > first > > > part > > > > > > > > mentioned > > > > > > > > > > in > > > > > > > > > > > the FLIP document). I have written down our proposal > [1] > > > and > > > > > you > > > > > > > can > > > > > > > > > also > > > > > > > > > > > check out our PoC that we have implemented [2]. > > > > > > > > > > > > > > > > > > > > > > We think that this is a quite easy proposal, that has > been > > > in > > > > > > large > > > > > > > > > part > > > > > > > > > > > already implemented. There is one obvious limitation > of our > > > > > PoC. > > > > > > > > Namely > > > > > > > > > > we > > > > > > > > > > > can only easily block individual SourceOperators. This > > > works > > > > > > > > perfectly > > > > > > > > > > fine > > > > > > > > > > > as long as there is at most one split per > SourceOperator. > > > > > However > > > > > > > it > > > > > > > > > > > doesn't work with multiple splits. In that case, if a > > > single > > > > > > > > > > > `SourceOperator` is responsible for processing both the > > > least > > > > > and > > > > > > > the > > > > > > > > > > most > > > > > > > > > > > advanced splits, we won't be able to block this most > > > advanced > > > > > > split > > > > > > > > for > > > > > > > > > > > generating new records. I'm proposing to solve this > problem > > > > in > > > > > > the > > > > > > > > > future > > > > > > > > > > > in another follow up FLIP, as a solution that works > with a > > > > > single > > > > > > > > split > > > > > > > > > > per > > > > > > > > > > > operator is easier and already valuable for some of the > > > > users. > > > > > > > > > > > > > > > > > > > > > > What do you think about this proposal? > > > > > > > > > > > Best, Piotrek > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources > > > > > > > > > > > [2] > > > > https://github.com/pnowojski/flink/commits/aligned-sources > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >