[
https://issues.apache.org/jira/browse/FLINK-35764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu Liu updated FLINK-35764:
----------------------------
Description:
*Description*
Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state
(marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as
follows:
- {{markStart()}} bumps {{currentMeasurementStartTS}} and
{{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
- {{currentCount}} stores the time in the state within this update interval
- When calling {{{}markEnd(){}}}, the time since {{currentMeasurementStartTS}}
is added to the {{currentCount}}
- When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added to
the {{{}currentCount{}}}, and the {{currentCount}} is used to update the gauge
value
The intent is that a state can span across two update intervals (by calling
{{{}markStart() -> update() -> markEnd(){}}}). However, the results will be
incorrect, since the time between {{markStart()}} and {{update()}} will be
counted in the first update interval by {{{}update(){}}}, and then in the
second update interval by {{{}markEnd(){}}}, so the measurement will be larger
than the correct value. The correct solution is to only add the time since
{{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
*Test case*
{quote}{{@Test }}
{{void testUpdateBeforeMarkingEnd() { }}
{{ ManualClock clock = new ManualClock(42_000_000);}}
{{ // this timer gauge measures 2 update intervals}}
{{ TimerGauge gauge = new TimerGauge(clock, 2 *
View.UPDATE_INTERVAL_SECONDS);}}
{{ long UPDATE_INTERVAL_MILLIS =
TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS); // 5000 ms}}
{{ long SLEEP = 10; // 10 ms}}
{{ // interval 1}}
{{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS);
// *(1)}}
{{ gauge.markStart(); }}
{{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
{{ gauge.update();}}
{{ // interval 2}}
{{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
{{ gauge.markEnd();}}
{{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP, TimeUnit.MILLISECONDS);
// *(1)}}
{{ gauge.update(); }}
{{ }}{{ }}
{{ // expected: 2, actual: 3}}
{{ assertThat(gauge.getValue()).isEqualTo(SLEEP /
View.UPDATE_INTERVAL_SECONDS); }}
{{}}}{quote}
The current test cases in {{TimerGaugeTest}} do not catch this bug, because the
assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and the code
does not simulate the time passed outside the state ({{{}*(1){}}} in the code
above).
was:
Currently in {{{}TimerGauge{}}}, the {{currentMeasurement}} in {{markEnd}} is
incorrectly set to the time since the last {{{}markStart{}}}. When calling
{{{}markStart -> update -> markEnd{}}}, this will result in the time between
{{markStart}} and {{update}} being counted twice. A piece of test code that
reflects this scenario:
{code:java}
@Test
void testUpdateBeforeMarkingEnd() {
ManualClock clock = new ManualClock(42_000_000);
// time span = 2 intervals
TimerGauge gauge = new TimerGauge(clock, 2 * View.UPDATE_INTERVAL_SECONDS);
// the event spans 2 intervals
// interval 1
gauge.markStart();
clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);
gauge.update();
// interval 2
clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);
gauge.markEnd();
gauge.update();
// expected: 2, actual: 3
assertThat(gauge.getValue()).isEqualTo(SLEEP /
View.UPDATE_INTERVAL_SECONDS);
}
{code}
Proposed changes:
# Modify {{markEnd}} so that updates to {{currentCount}} and
{{accumulatedCount}} resembles those in {{{}update{}}}.
# Add the test case to {{{}TimeGaugeTest{}}}.
> TimerGauge is incorrect when update is called during a measurement
> ------------------------------------------------------------------
>
> Key: FLINK-35764
> URL: https://issues.apache.org/jira/browse/FLINK-35764
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0
> Reporter: Liu Liu
> Priority: Major
>
> *Description*
> Currently in {{{}TimerGauge{}}}, the timer measures the time spent in a state
> (marked by {{markStart()}} and {{{}markEnd(){}}}). The current logic is as
> follows:
> - {{markStart()}} bumps {{currentMeasurementStartTS}} and
> {{{}currentUpdateTS{}}}, while {{update()}} bumps {{currentUpdateTS}}
> - {{currentCount}} stores the time in the state within this update interval
> - When calling {{{}markEnd(){}}}, the time since
> {{currentMeasurementStartTS}} is added to the {{currentCount}}
> - When calling {{{}update(){}}}, the time since {{currentUpdateTS}} is added
> to the {{{}currentCount{}}}, and the {{currentCount}} is used to update the
> gauge value
> The intent is that a state can span across two update intervals (by calling
> {{{}markStart() -> update() -> markEnd(){}}}). However, the results will be
> incorrect, since the time between {{markStart()}} and {{update()}} will be
> counted in the first update interval by {{{}update(){}}}, and then in the
> second update interval by {{{}markEnd(){}}}, so the measurement will be
> larger than the correct value. The correct solution is to only add the time
> since {{currentUpdateTS}} to {{currentCount}} in {{{}markEnd(){}}}.
> *Test case*
> {quote}{{@Test }}
> {{void testUpdateBeforeMarkingEnd() { }}
> {{ ManualClock clock = new ManualClock(42_000_000);}}
> {{ // this timer gauge measures 2 update intervals}}
> {{ TimerGauge gauge = new TimerGauge(clock, 2 *
> View.UPDATE_INTERVAL_SECONDS);}}
> {{ long UPDATE_INTERVAL_MILLIS =
> TimeUnit.SECONDS.toMillis(View.UPDATE_INTERVAL_SECONDS); // 5000 ms}}
> {{ long SLEEP = 10; // 10 ms}}
> {{ // interval 1}}
> {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP,
> TimeUnit.MILLISECONDS); // *(1)}}
> {{ gauge.markStart(); }}
> {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
> {{ gauge.update();}}
> {{ // interval 2}}
> {{ clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS); }}
> {{ gauge.markEnd();}}
> {{ clock.advanceTime(UPDATE_INTERVAL_MILLIS - SLEEP,
> TimeUnit.MILLISECONDS); // *(1)}}
> {{ gauge.update(); }}
> {{ }}{{ }}
> {{ // expected: 2, actual: 3}}
> {{ assertThat(gauge.getValue()).isEqualTo(SLEEP /
> View.UPDATE_INTERVAL_SECONDS); }}
> {{}}}{quote}
> The current test cases in {{TimerGaugeTest}} do not catch this bug, because
> the assert condition is (conveniently) {{{}isGreaterThanOrEqualTo{}}}, and
> the code does not simulate the time passed outside the state ({{{}*(1){}}} in
> the code above).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)