Hi Tawfik,

Thanks for offering such a proposal, looking forward to your research paper!

You could also ask the edit permission for Flink improvement proposals to 
create a new proposal if you want to contribute this to the community by 
yourself.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best
Yun Tang
________________________________
From: yuxia <luoyu...@alumni.sjtu.edu.cn>
Sent: Wednesday, September 6, 2023 12:31
To: dev <dev@flink.apache.org>
Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

Hi, Tawfik Yasser.
Thanks for the proposal.
It sounds exciting. I can't wait the research paper for more details.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "David Morávek" <d...@apache.org>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51
主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

Hi Tawfik,

It's exciting to see any ongoing research that tries to push Flink forward!

The get the discussion started, can you please your paper with the
community? Assessing the proposal without further context is tough.

Best,
D.

On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek <tyas...@nu.edu.eg>
wrote:

> Dear Apache Flink Development Team,
>
> I hope this email finds you well. I am writing to propose an exciting new
> feature for Apache Flink that has the potential to significantly enhance
> its capabilities in handling unbounded streams of events, particularly in
> the context of event-time windowing.
>
> As you may be aware, Apache Flink has been at the forefront of Big Data
> Stream processing engines, leveraging windowing techniques to manage
> unbounded event streams effectively. The accuracy of the results obtained
> from these streams relies heavily on the ability to gather all relevant
> input within a window. At the core of this process are watermarks, which
> serve as unique timestamps marking the progression of events in time.
>
> However, our analysis has revealed a critical issue with the current
> watermark generation method in Apache Flink. This method, which operates at
> the input stream level, exhibits a bias towards faster sub-streams,
> resulting in the unfortunate consequence of dropped events from slower
> sub-streams. Our investigations showed that Apache Flink's conventional
> watermark generation approach led to an alarming data loss of approximately
> 33% when 50% of the keys around the median experienced delays. This loss
> further escalated to over 37% when 50% of random keys were delayed.
>
> In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>
> Moreover, we have conducted comprehensive comparative studies to evaluate
> the effectiveness of our strategy against the conventional watermark
> generation method, specifically in terms of event-time tracking accuracy.
>
> We believe that implementing keyed watermarks in Apache Flink can greatly
> enhance its performance and reliability, making it an even more valuable
> tool for organizations dealing with complex, high-throughput data
> processing tasks.
>
> We kindly request your consideration of this proposal. We would be eager
> to discuss further details, provide the full research paper, or collaborate
> closely to facilitate the integration of this feature into Apache Flink.
>
> Thank you for your time and attention to this proposal. We look forward to
> the opportunity to contribute to the continued success and evolution of
> Apache Flink.
>
> Best Regards,
>
> Tawfik Yasser
> Senior Teaching Assistant @ Nile University, Egypt
> Email: tyas...@nu.edu.eg
> LinkedIn: https://www.linkedin.com/in/tawfikyasser/
>

Reply via email to