[ 
https://issues.apache.org/jira/browse/FLINK-19864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224634#comment-17224634
 ] 

Chesnay Schepler commented on FLINK-19864:
------------------------------------------

>From what I can tell the metrics work as intended, and this test failure also 
>cannot be explained with threading since long field in the watermark metric is 
>volatile. I couldn'T reproduce it locally after ~40k executions.

The only theoretical possibility I can come up with is that the task crashed 
while consuming the input, emptying the queue but not updating metrics.

The test uses {{StreamTaskTestHarness#waitForInputProcessing}} to wait for the 
task to have finished processing of the input. This method roughly does the 
following:
{code}
while (true) {
        checkForErrorInTaskThread()
        if (allInputConsumed()) {
                break
        }
}
{code}
If a task consumes data after checkForError, and then fails before reaching the 
point where metrics are updated, then this is not noticed anywhere.
 

> TwoInputStreamTaskTest.testWatermarkMetrics failed with "expected:<1> but 
> was:<-9223372036854775808>"
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19864
>                 URL: https://issues.apache.org/jira/browse/FLINK-19864
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.12.0
>            Reporter: Dian Fu
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8541&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=7c61167f-30b3-5893-cc38-a9e3d057e392
> {code}
> 2020-10-28T22:40:44.2528420Z [ERROR] 
> testWatermarkMetrics(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest)
>  Time elapsed: 1.528 s <<< FAILURE! 2020-10-28T22:40:44.2529225Z 
> java.lang.AssertionError: expected:<1> but was:<-9223372036854775808> 
> 2020-10-28T22:40:44.2541228Z at org.junit.Assert.fail(Assert.java:88) 
> 2020-10-28T22:40:44.2542157Z at 
> org.junit.Assert.failNotEquals(Assert.java:834) 2020-10-28T22:40:44.2542954Z 
> at org.junit.Assert.assertEquals(Assert.java:645) 
> 2020-10-28T22:40:44.2543456Z at 
> org.junit.Assert.assertEquals(Assert.java:631) 2020-10-28T22:40:44.2544002Z 
> at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testWatermarkMetrics(TwoInputStreamTaskTest.java:540)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to