Re:[DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

Wencong Liu Wed, 24 Jan 2024 04:55:41 -0800

Hi Weijie,

Thank you for the effort you've put into the DataStream API ! By reorganizing 
and 
redesigning the DataStream API, as well as addressing some of the unreasonable 
designs within it, we can enhance the efficiency of job development for 
developers. 
It also allows developers to design more flexible Flink jobs to meet business 
requirements.

I have conducted a comprehensive review of the DataStream API design in 
versions 
1.18 and 1.19. I found quite a few functional defects in the DataStream API, 
such as the
lack of corresponding APIs in batch processing scenarios. In the upcoming 1.20 
version, 
I will further improve the DataStream API in batch computing scenarios.

The issues existing in the old DataStream API (which can be referred to as V1) 
can be 
addressed from a design perspective in the initial version of V2. I hope to 
also have the
 opportunity to participate in the development of DataStream V2 and make my 
contribution.

Regarding FLIP-408, I have a question: The Processing TimerService is currently 
defined as one of the basic primitives, partly because it's understood that 
you have to choose between processing time and event time. 
The other part of the reason is that it needs to work based on the task's
mailbox thread model to avoid concurrency issues. Could you clarify the second
part of the reason?

Best,
Wencong Liu

At 2023-12-26 14:42:20, "weijie guo" <[email protected]> wrote:
>Hi devs,
>
>
>I'd like to start a discussion about FLIP-408: [Umbrella] Introduce
>DataStream API V2 [1].
>
>
>The DataStream API is one of the two main APIs that Flink provides for
>writing data processing programs. As an API that was introduced
>practically since day-1 of the project and has been evolved for nearly
>a decade, we are observing more and more problems of it. Improvements
>on these problems require significant breaking changes, which makes
>in-place refactor impractical. Therefore, we propose to introduce a
>new set of APIs, the DataStream API V2, to gradually replace the
>original DataStream API.
>
>
>The proposal to introduce a whole set new API is complex and includes
>massive changes. We are planning  to break it down into multiple
>sub-FLIPs for incremental discussion. This FLIP is only used as an
>umbrella, mainly focusing on motivation, goals, and overall planning.
>That is to say, more design and implementation details  will be
>discussed in other FLIPs.
>
>
>Given that it's hard to imagine the detailed design of the new API if
>we're just talking about this umbrella FLIP, and we probably won't be
>able to give an opinion on it. Therefore, I have prepared two
>sub-FLIPs [2][3] at the same time, and the discussion of them will be
>posted later in separate threads.
>
>
>Looking forward to hearing from you, thanks!
>
>
>Best regards,
>
>Weijie
>
>
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
>
>[2]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
>
>
>[3]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2

Re:[DISCUSS] FLIP-408: [Umbrella] Introduce DataStream API V2

Reply via email to