Re:Re: [DISCUSS] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

Wencong Liu Tue, 12 Dec 2023 03:02:16 -0800

Dear Yuxin,

Thank you for your support and for raising those insightful
queries.

Indeed, the full window processing aligns more naturally 
with batch scenarios. The primary reason we've limited 
it to batch processing is due to state management 
considerations in a streaming context. If we were to extend 
support to streaming mode, it would require the state 
backend maintaining all the data received from the 
start to the finish of a task's computation. Such an approach
will lead to reduced computational efficiency and 
potentially impact job stability. In scenarios where full window 
processing is applied, the user is typically unconcerned with 
processing latency. Consequently, batch mode is often 
favored for executing full window operations to get higher 
throughput, as it eliminates the need for frequent checkpoints.

Additionally, the full window operators can perform failovers 
similar to the current batch mode execution. In this setup, 
intermediate results produced by an operator can be persisted 
within the shuffle service. Subsequently, in the event of a failover, 
downstream operators have the capability to re-read the data from 
the shuffle service.

I appreciate your feedback on the API implementation 
section formatting. I will ensure that the text is updated.

Best regards,
Wencong

At 2023-12-12 15:48:23, "Yuxin Tan" <tanyuxinw...@gmail.com> wrote:
>Thanks Wencong for driving this FLIP.
>
>+1 from my side. It appears to significantly improve the handling
>of full-window data within the DataStream API. However, I do
>have a small question regarding the current limitation to batch
>processing: does this stem from performance-related considerations?
>Additionally, is there any possibility that support for streaming
>in the future?
>
>In addition, some format of the section `API implementation` is
>not right (some lines have exceeded the text box), maybe we
>can update and fix it.
>
>Best,
>Yuxin
>
>
>weijie guo <guoweijieres...@gmail.com> 于2023年12月12日周二 15:09写道：
>
>> Thanks Wencong for driving this!
>>
>> I believe this is a useful feature, so +1 from my side.
>>
>> I only have one minor question about the exchange mode of `xxxPartition`
>> method. Does this means the window operator must be connected to the
>> upstream operator in forward edge (otherwise the concept of mapPartition is
>> a bit far-fetched).
>>
>> Best regards,
>>
>> Weijie
>>
>>
>> Wencong Liu <liuwencle...@163.com> 于2023年12月1日周五 14:04写道：
>>
>> > Hi devs,
>> >
>> > I'm excited to propose a new FLIP[1] aimed at enhancing the DataStream
>> API
>> >
>> > to support full window processing on non-keyed streams. This feature
>> > addresses
>> > the current limitation where non-keyed DataStreams cannot accumulate
>> > records
>> > per subtask for collective processing at the end of input.
>> >
>> > Key proposals include:
>> >
>> >
>> > 1. Introduction of PartitionWindowedStream allowing non-keyed DataStreams
>> > to
>> > be transformed for full window processing per subtask.
>> >
>> > 2. Addition of four new APIs - mapPartition, sortPartition, aggregate,
>> and
>> > reduce
>> > - to enable powerful operations on PartitionWindowedStream.
>> >
>> > This initiative seeks to fill the gap left by the deprecation of the
>> > DataSet API,
>> > marrying its partition processing strengths with the dynamic capabilities
>> > of the DataStream API.
>> >
>> > Looking forward to your feedback on this FLIP.
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream
>> >
>> > Best regards,
>> > Wencong Liu
>>

Re:Re: [DISCUSS] FLIP-380: Support Full Partition Processing On Non-keyed DataStream

Reply via email to