[ 
https://issues.apache.org/jira/browse/FLINK-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

realdengziqi updated FLINK-26334:
---------------------------------
    Description: 
h2. issue

        Hello!

        When we were studying the flink source code, we found that there was a 
problem with its algorithm for calculating the window start time. When 
_timestamp - offset + windowSize < 0_ , the element will be incorrectly 
allocated to a window with a WindowSize larger than its own timestamp.

        The problem is in 
_org.apache.flink.streaming.api.windowing.windows.TimeWindow_
{code:java}
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    return timestamp - (timestamp - offset + windowSize) % windowSize;
} {code}
_!image-2022-03-04-11-28-26-616.png|width=710,height=251!_

        We believe that this violates the constraints between time and window. 
*That is, an element should fall within a window whose start time is less than 
its own timestamp and whose end time is greater than its own timestamp.* 
However, the current situation is when {_}timestamp - offset + windowSize < 
0{_}, *the element falls into a future time window.*

        *You can reproduce this bug with the code at the end of the article.*
h2. Solution       

        In fact, the original algorithm is no problem in python, the key to 
this problem is the processing of the remainder operation by the programming 
language.

        We finally think that it should be modified to the following algorithm.
{code:java}
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    return timestamp
            - (timestamp - offset) % windowSize
            - (windowSize & (timestamp - offset) >> 63);
} {code}
        _windowSize & (timestamp - offset) >> 63_ The function of this formula 
is to subtract windowSize from the overall operation result when {_}timestamp - 
offset<0{_}, otherwise do nothing. This way we can handle both positive and 
negative timestamps.

        Finally, the element can be assigned to the correct window.

!image-2022-03-04-11-37-10-035.png|width=712,height=284!

        This code can pass current unit tests.
h2. getWindowStartWithOffset methods in other packages

        I think that there should be many places in 
{_}getWindowStartWithOffset{_}. We searched for this method in the project and 
found that the problem of negative timestamps is handled in _flink.table._

        Below is their source code.

        
_{{org.apache.flink.table.runtime.operators.window.grouping.WindowsGrouping}}_
{code:java}
private long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    long remainder = (timestamp - offset) % windowSize;
    // handle both positive and negative cases
    if (remainder < 0) {
        return timestamp - (remainder + windowSize);
    } else {
        return timestamp - remainder;
    }
} {code}
h2. Can we make a pull request?

        If the community deems it necessary to revise it, hopefully this task 
can be handed over to us. Our members are all students who have just graduated 
from school, and it is a great encouragement for us to contribute code to flink.

        Thank you so much!

        From Deng Ziqi & Lin Wanni & Guo Yuanfang
----
----
 

  was:
h2. issue

        Hello!

        When we were studying the flink source code, we found that there was a 
problem with its algorithm for calculating the window start time. When 
_timestamp - offset + windowSize < 0_ , the element will be incorrectly 
allocated to a window with a WindowSize larger than its own timestamp.

        The problem is in 
_org.apache.flink.streaming.api.windowing.windows.TimeWindow_
{code:java}
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    return timestamp - (timestamp - offset + windowSize) % windowSize;
} {code}
_!image-2022-03-04-11-28-26-616.png|width=710,height=251!_

        We believe that this violates the constraints between time and window. 
That is, an element should fall within a window whose start time is less than 
its own timestamp and whose end time is greater than its own timestamp. 
However, the current situation is when {_}timestamp - offset + windowSize < 
0{_}, the element falls into a future time window.
h2. Solution       

        In fact, the original algorithm is no problem in python, the key to 
this problem is the processing of the remainder operation by the programming 
language.

        We finally think that it should be modified to the following algorithm.
{code:java}
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    return timestamp
            - (timestamp - offset) % windowSize
            - (windowSize & (timestamp - offset) >> 63);
} {code}
        _windowSize & (timestamp - offset) >> 63_ The function of this formula 
is to subtract windowSize from the overall operation result when {_}timestamp - 
offset<0{_}, otherwise do nothing. This way we can handle both positive and 
negative timestamps.

        Finally, the element can be assigned to the correct window.

!image-2022-03-04-11-37-10-035.png|width=712,height=284!

        This code can pass current unit tests.
h2. getWindowStartWithOffset methods in other packages

        I think that there should be many places in 
{_}getWindowStartWithOffset{_}. We searched for this method in the project and 
found that the problem of negative timestamps is handled in _flink.table._

        Below is their source code.

        
_{{org.apache.flink.table.runtime.operators.window.grouping.WindowsGrouping}}_
{code:java}
private long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {
    long remainder = (timestamp - offset) % windowSize;
    // handle both positive and negative cases
    if (remainder < 0) {
        return timestamp - (remainder + windowSize);
    } else {
        return timestamp - remainder;
    }
} {code}
h2. Can we make a pull request?

        If the community deems it necessary to revise it, hopefully this task 
can be handed over to us. Our members are all students who have just graduated 
from school, and it is a great encouragement for us to contribute code to flink.

        Thank you so much!

        From Deng Ziqi & Lin Wanni & Guo Yuanfang


> When timestamp - offset + windowSize < 0, elements cannot be assigned to the 
> correct window
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26334
>                 URL: https://issues.apache.org/jira/browse/FLINK-26334
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.15.0, 1.14.3
>         Environment: flink version 1.14.3
>            Reporter: realdengziqi
>            Priority: Major
>         Attachments: image-2022-03-04-11-28-26-616.png, 
> image-2022-03-04-11-37-10-035.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> h2. issue
>         Hello!
>         When we were studying the flink source code, we found that there was 
> a problem with its algorithm for calculating the window start time. When 
> _timestamp - offset + windowSize < 0_ , the element will be incorrectly 
> allocated to a window with a WindowSize larger than its own timestamp.
>         The problem is in 
> _org.apache.flink.streaming.api.windowing.windows.TimeWindow_
> {code:java}
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
>     return timestamp - (timestamp - offset + windowSize) % windowSize;
> } {code}
> _!image-2022-03-04-11-28-26-616.png|width=710,height=251!_
>         We believe that this violates the constraints between time and 
> window. *That is, an element should fall within a window whose start time is 
> less than its own timestamp and whose end time is greater than its own 
> timestamp.* However, the current situation is when {_}timestamp - offset + 
> windowSize < 0{_}, *the element falls into a future time window.*
>         *You can reproduce this bug with the code at the end of the article.*
> h2. Solution       
>         In fact, the original algorithm is no problem in python, the key to 
> this problem is the processing of the remainder operation by the programming 
> language.
>         We finally think that it should be modified to the following 
> algorithm.
> {code:java}
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
>     return timestamp
>             - (timestamp - offset) % windowSize
>             - (windowSize & (timestamp - offset) >> 63);
> } {code}
>         _windowSize & (timestamp - offset) >> 63_ The function of this 
> formula is to subtract windowSize from the overall operation result when 
> {_}timestamp - offset<0{_}, otherwise do nothing. This way we can handle both 
> positive and negative timestamps.
>         Finally, the element can be assigned to the correct window.
> !image-2022-03-04-11-37-10-035.png|width=712,height=284!
>         This code can pass current unit tests.
> h2. getWindowStartWithOffset methods in other packages
>         I think that there should be many places in 
> {_}getWindowStartWithOffset{_}. We searched for this method in the project 
> and found that the problem of negative timestamps is handled in _flink.table._
>         Below is their source code.
>         
> _{{org.apache.flink.table.runtime.operators.window.grouping.WindowsGrouping}}_
> {code:java}
> private long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
>     long remainder = (timestamp - offset) % windowSize;
>     // handle both positive and negative cases
>     if (remainder < 0) {
>         return timestamp - (remainder + windowSize);
>     } else {
>         return timestamp - remainder;
>     }
> } {code}
> h2. Can we make a pull request?
>         If the community deems it necessary to revise it, hopefully this task 
> can be handed over to us. Our members are all students who have just 
> graduated from school, and it is a great encouragement for us to contribute 
> code to flink.
>         Thank you so much!
>         From Deng Ziqi & Lin Wanni & Guo Yuanfang
> ----
> ----
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to