Dear Yuxin,
Thank you for your support and for raising those insightful queries. Indeed, the full window processing aligns more naturally with batch scenarios. The primary reason we've limited it to batch processing is due to state management considerations in a streaming context. If we were to extend support to streaming mode, it would require the state backend maintaining all the data received from the start to the finish of a task's computation. Such an approach will lead to reduced computational efficiency and potentially impact job stability. In scenarios where full window processing is applied, the user is typically unconcerned with processing latency. Consequently, batch mode is often favored for executing full window operations to get higher throughput, as it eliminates the need for frequent checkpoints. Additionally, the full window operators can perform failovers similar to the current batch mode execution. In this setup, intermediate results produced by an operator can be persisted within the shuffle service. Subsequently, in the event of a failover, downstream operators have the capability to re-read the data from the shuffle service. I appreciate your feedback on the API implementation section formatting. I will ensure that the text is updated. Best regards, Wencong At 2023-12-12 15:48:23, "Yuxin Tan" <tanyuxinw...@gmail.com> wrote: >Thanks Wencong for driving this FLIP. > >+1 from my side. It appears to significantly improve the handling >of full-window data within the DataStream API. However, I do >have a small question regarding the current limitation to batch >processing: does this stem from performance-related considerations? >Additionally, is there any possibility that support for streaming >in the future? > >In addition, some format of the section `API implementation` is >not right (some lines have exceeded the text box), maybe we >can update and fix it. > >Best, >Yuxin > > >weijie guo <guoweijieres...@gmail.com> 于2023年12月12日周二 15:09写道: > >> Thanks Wencong for driving this! >> >> I believe this is a useful feature, so +1 from my side. >> >> I only have one minor question about the exchange mode of `xxxPartition` >> method. Does this means the window operator must be connected to the >> upstream operator in forward edge (otherwise the concept of mapPartition is >> a bit far-fetched). >> >> Best regards, >> >> Weijie >> >> >> Wencong Liu <liuwencle...@163.com> 于2023年12月1日周五 14:04写道: >> >> > Hi devs, >> > >> > I'm excited to propose a new FLIP[1] aimed at enhancing the DataStream >> API >> > >> > to support full window processing on non-keyed streams. This feature >> > addresses >> > the current limitation where non-keyed DataStreams cannot accumulate >> > records >> > per subtask for collective processing at the end of input. >> > >> > Key proposals include: >> > >> > >> > 1. Introduction of PartitionWindowedStream allowing non-keyed DataStreams >> > to >> > be transformed for full window processing per subtask. >> > >> > 2. Addition of four new APIs - mapPartition, sortPartition, aggregate, >> and >> > reduce >> > - to enable powerful operations on PartitionWindowedStream. >> > >> > This initiative seeks to fill the gap left by the deprecation of the >> > DataSet API, >> > marrying its partition processing strengths with the dynamic capabilities >> > of the DataStream API. >> > >> > Looking forward to your feedback on this FLIP. >> > >> > [1] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream >> > >> > Best regards, >> > Wencong Liu >>