[
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski closed FLINK-35764.
----------------------------------
Fix Version/s: 2.0.0
1.19.2
1.20.1
Resolution: Fixed
Thanks for reporting and fixing the issue [~auroflow].
Merged to master as 4fae64cb011
Merged to release-1.20 as c4dd2a6550a
Merged to release-1.19 as 98f1349bb4b
> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
> Key: FLINK-35764
> URL: https://issues.apache.org/jira/browse/FLINK-35764
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0
> Reporter: Liu Liu
> Assignee: Liu Liu
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0, 1.19.2, 1.20.1
>
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as
> follows:
> - {{markStart()}} bumps {{currentMeasurementStartTS}} and
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
> - {{currentCount}} stores the time in the state within this update interval
> - When calling {{{}markEnd(){}}}, the time since
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
> - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the
> gauge value
> The intent is that a state can span across two update intervals (by calling
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be
> incorrect, since the time between {{markStart()}} and {{update()}} will be
> counted in the first update interval by {{{}update(){}}}, and then in the
> second update interval by {{{}markEnd(){}}}, so the measurement will be
> larger than the correct value. The correct solution is to only add the time
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
> {{@Test }}
> {{void testUpdateBeforeMarkingEnd() { }}
> {{ ManualClock clock = new ManualClock(42_000_000);}}
> {{ // this timer gauge measures 2 update intervals}}
> {{ TimerGauge gauge = new TimerGauge(clock, 2 *
> View.UPDATE_INTERVAL_SECONDS);}}
> {{ long UPDATE_INTERVAL_MILLIS =
> TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS); // 5000 ms}}
> {{ long SLEEP = 10; // 10 ms}}
> {{ // interval 1}}
> {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP,
> TimeUnit.MILLISECONDS); // *(1)}}
> {{ gauge.markStart(); }}
> {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
> {{ gauge.update();}}
> {{ // interval 2}}
> {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
> {{ gauge.markEnd();}}
> {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP,
> TimeUnit.MILLISECONDS); // *(1)}}
> {{ gauge.update(); }}
> {{ // expected: 2, actual: 3}}
> {{ assertThat(gauge.getValue()).isEqualTo(SLEEP /
> View.UPDATE_INTERVAL_SECONDS); }}
> {{}}}
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in
> the code above).
> *Proposed changes*
> * In {{{}TimerGauge{}}}, only add the time since {{currentUpdateTS}} to
> {{currentCount}} in {{markEnd()}}
> * Add the test case above to {{{}TimerGaugeTest{}}}, and adjust other test
> cases
--
This message was sent by Atlassian Jira
(v8.20.10#820010)