It's great to see the discussion about what we need to improve on
(completely) switching from DataSet API to DataStream API from the user
perspective. I feel that these improvements would happen faster (only) when
we seriously prepare to remove the DataSet APIs with a target release, just
like what we are doing now. And the same applies to the SinkV1 related
discussions (smile).

I support Xintong's opinion on keeping "Remove the DataSet APIs" a
must-have item, meantime I support Yuxia's opinion that we should
explicitly let our users know how to migrate their existing DataSet API
based applications afterwards, meaning that the guideline Xintong mentioned
is a must-have (rather than best efforts) before removing the DataSet APIs.

Best Regards,
Yu


On Wed, 12 Jul 2023 at 14:00, yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:

> Thanks Xintong for clarification. A guideline to help users migrating from
> DataSet to DataStream will definitely be helpful.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Xintong Song" <tonysong...@gmail.com>
> 收件人: "dev" <dev@flink.apache.org>
> 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12
> 主题: Re: [VOTE] Release 2.0 must-have work items
>
> @Yuxia,
>
> We are aware of the issue that you mentioned. Actually, I don't think the
> DataStream API can cover everything in the DataSet API in exactly the same
> way, because the fundamental model, concepts and primitives of the two sets
> of APIs are completely different. Many of the DataSet APIs, especially
> those accessing the full data set at once, do not fit in the DataStream
> concepts at all. I think what's important is that users can achieve the
> same function, even if they may need to code in a different way.
>
> We have gone through all the existing DataSet APIs, and categorized them
> into 3 kinds:
> - APIs that are well supported by DataStream API as is. E.g., map, reduce
> on grouped dataset, etc.
> - APIs that can be achieved by DataStream API as is, but with a price
> (programming complexity, or computation efficiency). E.g., reduce on full
> dataset, sort partition, etc. Admittedly, there is room for improvement on
> these. We may keep improving these for the DataStream API, or we can
> concentrate on supporting them better in the new ProcessFunction API.
> Either way, I don't think we should block the retiring of DataSet API on
> them.
> - There are also a few APIs that cannot be supported by the DataStream API
> as is, unless users write their custom operators from the ground up. Only
> left/rightOuterJoin and combineGroup fall into this category. I think
> combinedGroup is probably not a problem, because this is more like a
> variant of reduceGroup that allows the framework to execute more
> efficiently. As for the outer joins, depending on how badly this is needed,
> it can be supported by emitting the non-joined entries upon triggering a
> window join.
>
> We are also planning to draft a guideline to help users migrating from
> DataSet to DataStream, which should demonstrate how users can achieve
> things like sort-partition with DataStream API.
>
> Last but not least, I'd like to point out that the decision to deprecate
> and eventually remove the DataSet API was approved in FLIP-131, and all the
> prerequisites mentioned in the FLIP have been completed.
>
> Best,
>
> Xintong
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
>
>
>
> On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li <jingsongl...@gmail.com>
> wrote:
>
> > +1 to Leonard and Galen and Jing.
> >
> > About Source and Sink.
> > We're still missing quite a bit of work, including functionality,
> > including ease of use, including bug fixes, and I'm not sure we'll be
> > completely done by 2.0.
> > Until that's done, we won't be in a position to clean up the old APIs.
> >
> > Best,
> > Jingsong
> >
> > On Wed, Jul 12, 2023 at 9:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
> > >
> > > Hi,Xintong.
> > > Sorry to disturb the voting. I just found an email[1] about DataSet API
> > from flink-user-zh channel. And I think it's not just a single case
> > according to my observation.
> > >
> > > Remove DataSet is a must have item in release-2.0. But as the user
> email
> > said, if we remove DataSet, how users can implement Sort/PartitionBy, etc
> > as they did with DataSet?
> > > Do we will also provide similar api in datastream or some other thing
> > before we remove DataSet?
> > > Btw, as far as I see, with regarding to replcaing DataSet with
> > Datastream, Datastream are missing many API. I think it may well take
> much
> > effort to fully cover the missing api.
> > >
> > > [1] https://lists.apache.org/thread/syjmt8f74gh8ok3z4lhgt95zl4dzn168
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > ----- 原始邮件 -----
> > > 发件人: "Jing Ge" <j...@ververica.com.INVALID>
> > > 收件人: "dev" <dev@flink.apache.org>
> > > 发送时间: 星期三, 2023年 7 月 12日 上午 1:23:40
> > > 主题: Re: [VOTE] Release 2.0 must-have work items
> > >
> > > agree with what Leonard said. There are actually more issues wrt the
> new
> > > Source and SinkV2[1]
> > >
> > > Speaking of must-have vs nice-to-have, I think it depends on the
> > priority.
> > > If removing them has higher priority, we should keep related tasks as
> > > must-have and make sure enough effort will be put to solve those issues
> > and
> > > therefore be able to remove those APIs.
> > >
> > > Best regards,
> > > Jing
> > >
> > > [1] https://lists.apache.org/thread/90qc9nrlzf0vbvg92klzp9ftxxc43nbk
> > >
> > > On Tue, Jul 11, 2023 at 10:26 AM Leonard Xu <xbjt...@gmail.com> wrote:
> > >
> > > > Thanks Xintong for driving this great work! But I’ve to give my
> > > > -1(binding) here:
> > > >
> > > > -1 to mark "deprecat SourceFunction/SinkFunction/Sinkv1" item as must
> > to
> > > > have for release 2.0.
> > > >
> > > > I do a lot of connector work in the community, and I have two
> insights
> > > > from past experience:
> > > >
> > > > 1. Many developers reported that it is very difficult to migrate from
> > > > SourceFunction to new Source [1]. The migration of existing
> conenctors
> > > > after deprecated SourceFunction is very difficult. Some developers
> > (Flavio
> > > > Pompermaier) reported that they gave up the migration because it was
> > too
> > > > complicated. I believe it's not a few cases. This means that
> > deprecating
> > > > SourceFunction related interfaces require community contributors to
> > reduce
> > > > the migration cost before starting the migration work.
> > > >
> > > > 2. IIRC, the function of SinkV2 cannot currently cover SinkFunction
> as
> > > > described in FLIP-287[2], it means the migration path after deprecate
> > > > SinkFunction/Sinkv1 does not exist, thus we cannot mark the related
> > > > interfaces of sinkfunction/sinkv1  as deprecated in 1.18.
> > > >
> > > > Based on these two cognitions, I think we should not mark these
> > interfaces
> > > > as must to have in 2.0. Maintaining the two sets of source/sink
> > interfaces
> > > > is not a concern for me, users can choose the interface to implement
> > > > according to their energy and needs.
> > > >
> > > > Btw, some work items in 2.0 are marked as must to have, but no
> > contributor
> > > > has claimed them yet. I think this is a risk and hope the Release
> > Managers
> > > > could pay attention to it.
> > > >
> > > > Thank you all RMs for your work, sorry again for interrupting the
> vote
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > [1] https://lists.apache.org/thread/sqq26s9rorynr4vx4nhxz3fmmxpgtdqp
> > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> > > >
> > > > > On Jul 11, 2023, at 4:11 PM, Yuan Mei <yuanmei.w...@gmail.com>
> > wrote:
> > > > >
> > > > > As a second thought, I think "Eager State Declaration" is probably
> > not a
> > > > > must-have.
> > > > >
> > > > > I was originally thinking it is a prerequisite for "state querying
> > for
> > > > > disaggregated state management".
> > > > >
> > > > > Since disaggregated state management itself is not a must-have,
> > "Eager
> > > > > State Declaration" is not as well. We can downgrade it to "nice to
> > have"
> > > > if
> > > > > no objection.
> > > > >
> > > > > Best
> > > > >
> > > > > Yuan
> > > > >
> > > > > On Mon, Jul 10, 2023 at 7:02 PM Jing Ge <j...@ververica.com.invalid
> >
> > > > wrote:
> > > > >
> > > > >> +1
> > > > >>
> > > > >> On Mon, Jul 10, 2023 at 12:52 PM Yu Li <car...@gmail.com> wrote:
> > > > >>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>> Thanks for driving this and great to see us moving forward.
> > > > >>>
> > > > >>> Best Regards,
> > > > >>> Yu
> > > > >>>
> > > > >>>
> > > > >>> On Mon, 10 Jul 2023 at 11:59, Feng Wang <wangfeng...@gmail.com>
> > wrote:
> > > > >>>
> > > > >>>> +1
> > > > >>>> Thanks for driving this, looking forward to the next stage of
> > flink.
> > > > >>>>
> > > > >>>> On Fri, Jul 7, 2023 at 5:31 PM Xintong Song <
> > tonysong...@gmail.com>
> > > > >>> wrote:
> > > > >>>>
> > > > >>>>> Hi all,
> > > > >>>>>
> > > > >>>>> I'd like to start the VOTE for the must-have work items for
> > release
> > > > >> 2.0
> > > > >>>>> [1]. The corresponding discussion thread is [2].
> > > > >>>>>
> > > > >>>>> Please note that once the vote is approved, any changes to the
> > > > >>> must-have
> > > > >>>>> items (adding / removing must-have items, changing the
> priority)
> > > > >>> requires
> > > > >>>>> another vote. Assigning contributors / reviewers, updating
> > > > >>> descriptions /
> > > > >>>>> progress, changes to nice-to-have items do not require another
> > vote.
> > > > >>>>>
> > > > >>>>> The vote will be open until at least July 12, following the
> > consensus
> > > > >>>>> voting process. Votes of PMC members are binding.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>>
> > > > >>>>> Xintong
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> [1]
> > https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > > >>>>>
> > > > >>>>> [2]
> > https://lists.apache.org/thread/l3dkdypyrovd3txzodn07lgdwtwvhgk4
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> >
>

Reply via email to