[
https://issues.apache.org/jira/browse/FLINK-19864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224634#comment-17224634
]
Chesnay Schepler commented on FLINK-19864:
------------------------------------------
>From what I can tell the metrics work as intended, and this test failure also
>cannot be explained with threading since long field in the watermark metric is
>volatile. I couldn'T reproduce it locally after ~40k executions.
The only theoretical possibility I can come up with is that the task crashed
while consuming the input, emptying the queue but not updating metrics.
The test uses {{StreamTaskTestHarness#waitForInputProcessing}} to wait for the
task to have finished processing of the input. This method roughly does the
following:
{code}
while (true) {
checkForErrorInTaskThread()
if (allInputConsumed()) {
break
}
}
{code}
If a task consumes data after checkForError, and then fails before reaching the
point where metrics are updated, then this is not noticed anywhere.
> TwoInputStreamTaskTest.testWatermarkMetrics failed with "expected:<1> but
> was:<-9223372036854775808>"
> -----------------------------------------------------------------------------------------------------
>
> Key: FLINK-19864
> URL: https://issues.apache.org/jira/browse/FLINK-19864
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.12.0
> Reporter: Dian Fu
> Priority: Critical
> Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8541&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=7c61167f-30b3-5893-cc38-a9e3d057e392
> {code}
> 2020-10-28T22:40:44.2528420Z [ERROR]
> testWatermarkMetrics(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest)
> Time elapsed: 1.528 s <<< FAILURE! 2020-10-28T22:40:44.2529225Z
> java.lang.AssertionError: expected:<1> but was:<-9223372036854775808>
> 2020-10-28T22:40:44.2541228Z at org.junit.Assert.fail(Assert.java:88)
> 2020-10-28T22:40:44.2542157Z at
> org.junit.Assert.failNotEquals(Assert.java:834) 2020-10-28T22:40:44.2542954Z
> at org.junit.Assert.assertEquals(Assert.java:645)
> 2020-10-28T22:40:44.2543456Z at
> org.junit.Assert.assertEquals(Assert.java:631) 2020-10-28T22:40:44.2544002Z
> at
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkMetrics(TwoInputStreamTaskTest.java:540)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)