[ 
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Liu updated FLINK-35764:
----------------------------
    Description: 
*Description*

Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
(marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
follows:

- {{markStart()}} bumps {{currentMeasurementStartTS}} and 
{{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
- {{currentCount}} stores the time in the state within this update interval
- When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}} 
is added to the {{currentCount}}
- When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added to 
the {{{}currentCount{}}}, and the {{currentCount}} is used to update the gauge 
value

The intent is that a state can span across two update intervals (by calling 
{{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
incorrect, since the time between {{markStart()}} and {{update()}} will be 
counted in the first update interval by {{{}update(){}}}, and then in the 
second update interval by {{{}markEnd(){}}}, so the measurement will be larger 
than the correct value. The correct solution is to only add the time since 
{{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.

*Test case*
{quote}{{@Test  }}
{{void testUpdateBeforeMarkingEnd() {  }}
{{    ManualClock clock = new ManualClock(42_000_000);}}
{{    // this timer gauge measures 2 update intervals}}
{{    TimerGauge gauge = new TimerGauge(clock, 2 * 
View.UPDATE_INTERVAL_SECONDS);}}
{{    long UPDATE_INTERVAL_MILLIS = 
TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS);  // 5000 ms}}
{{    long SLEEP = 10;  // 10 ms}}

{{    // interval 1}}
{{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS); 
 // *(1)}}
{{    gauge.markStart();  }}
{{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
{{    gauge.update();}}

{{    // interval 2}}
{{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
{{    gauge.markEnd();}}
{{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS); 
 // *(1)}}
{{    gauge.update(); }}
{{ }}{{    }}
{{    // expected: 2, actual: 3}}
{{    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
View.UPDATE_INTERVAL_SECONDS);  }}
{{}}}{quote}
The current test cases in {{TimerGaugeTest}} do not catch this bug, because the 
assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code 
does not simulate the time passed outside the state ({{{}*(1){}}} in the code 
above).

  was:
Currently in {{{}TimerGauge{}}}, the {{currentMeasurement}} in {{markEnd}} is 
incorrectly set to the time since the last {{{}markStart{}}}. When calling 
{{{}markStart -> update -> markEnd{}}}, this will result in the time between 
{{markStart}} and {{update}} being counted twice. A piece of test code that 
reflects this scenario:
{code:java}
@Test  
void testUpdateBeforeMarkingEnd() {  
    ManualClock clock = new ManualClock(42_000_000);
    // time span = 2 intervals
    TimerGauge gauge = new TimerGauge(clock, 2 * View.UPDATE_INTERVAL_SECONDS); 
 

    // the event spans 2 intervals
    // interval 1
    gauge.markStart();  
    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  
    gauge.update();
    // interval 2
    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  
    gauge.markEnd();  
    gauge.update();  

    // expected: 2, actual: 3
    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
View.UPDATE_INTERVAL_SECONDS);  
}
{code}
Proposed changes:
 # Modify {{markEnd}} so that updates to {{currentCount}} and 
{{accumulatedCount}} resembles those in {{{}update{}}}.
 # Add the test case to {{{}TimeGaugeTest{}}}.


> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
>                 Key: FLINK-35764
>                 URL: https://issues.apache.org/jira/browse/FLINK-35764
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0
>            Reporter: Liu Liu
>            Priority: Major
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state 
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as 
> follows:
> - {{markStart()}} bumps {{currentMeasurementStartTS}} and 
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
> - {{currentCount}} stores the time in the state within this update interval
> - When calling {{{}markEnd(){}}}, the time since 
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
> - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added 
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the 
> gauge value
> The intent is that a state can span across two update intervals (by calling 
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be 
> incorrect, since the time between {{markStart()}} and {{update()}} will be 
> counted in the first update interval by {{{}update(){}}}, and then in the 
> second update interval by {{{}markEnd(){}}}, so the measurement will be 
> larger than the correct value. The correct solution is to only add the time 
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
> {quote}{{@Test  }}
> {{void testUpdateBeforeMarkingEnd() {  }}
> {{    ManualClock clock = new ManualClock(42_000_000);}}
> {{    // this timer gauge measures 2 update intervals}}
> {{    TimerGauge gauge = new TimerGauge(clock, 2 * 
> View.UPDATE_INTERVAL_SECONDS);}}
> {{    long UPDATE_INTERVAL_MILLIS = 
> TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS);  // 5000 ms}}
> {{    long SLEEP = 10;  // 10 ms}}
> {{    // interval 1}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.markStart();  }}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.update();}}
> {{    // interval 2}}
> {{    clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  }}
> {{    gauge.markEnd();}}
> {{    clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, 
> TimeUnit.MILLISECONDS);  // *(1)}}
> {{    gauge.update(); }}
> {{ }}{{    }}
> {{    // expected: 2, actual: 3}}
> {{    assertThat(gauge.getValue()).isEqualTo(SLEEP / 
> View.UPDATE_INTERVAL_SECONDS);  }}
> {{}}}{quote}
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because 
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and 
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in 
> the code above).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to