[
https://issues.apache.org/jira/browse/FLINK-31119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690314#comment-17690314
]
Matthias Pohl edited comment on FLINK-31119 at 2/17/23 10:57 AM:
-----------------------------------------------------------------
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521
{code}
01:07:57,099 [ Receiver (1/6)#1] WARN
org.apache.flink.runtime.taskmanager.Task [] - Receiver
(1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1)
switched from RUNNING to FAILED with failure cause:
java.lang.NullPointerException: null
at
org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82)
~[test-classes/:?]
at
org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
~[test-classes/:?]
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
~[classes/:?]
at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
[classes/:?]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
[classes/:?]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
[classes/:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}
This one fails with a {{NullPointerException}} in the same method
[TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71].
Essentially, the data that has been received seems to be corrupted
was (Author: mapohl):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46250&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8521
{code}
01:07:57,099 [ Receiver (1/6)#1] WARN
org.apache.flink.runtime.taskmanager.Task [] - Receiver
(1/6)#1 (e701d0caf3247ea7554acfb5dd8df541_cb0a5d4bcd60528ae7c4e8c99900a321_0_1)
switched from RUNNING to FAILED with failure cause:
java.lang.NullPointerException: null
at
org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:82)
~[test-classes/:?]
at
org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
~[test-classes/:?]
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
~[classes/:?]
at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
[classes/:?]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
[classes/:?]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
[classes/:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}
This one fails with a {{NullPointerException}} in the same method
[TestingAbstractInvokables.Receiver#invoke:71ff|https://github.com/apache/flink/blob/026675a5cb8a3704c51802fb549d6b0bc4759835/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/TestingAbstractInvokables.java#L71].
Essentially, the data that is received seems to be corrupted
> JobRecoveryITCase.testTaskFailureRecovery failed due to the job not finishing
> successfully
> ------------------------------------------------------------------------------------------
>
> Key: FLINK-31119
> URL: https://issues.apache.org/jira/browse/FLINK-31119
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Priority: Blocker
> Labels: test-stability
> Attachments: FLINK-31119.20230217.1.log, FLINK-31119.20230217.4.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46247&view=logs&j=0e7be18f-84f2-53f0-a32d-4a5e4a174679&t=7c1d86e3-35bd-5fd5-3b7c-30c126a78702&l=8523
> {code}
> Feb 17 02:24:35 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0,
> Time elapsed: 24.074 s <<< FAILURE! - in
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase
> Feb 17 02:24:35 [ERROR]
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery
> Time elapsed: 20.981 s <<< FAILURE!
> Feb 17 02:24:35 java.lang.AssertionError:
> Feb 17 02:24:35
> Feb 17 02:24:35 Expected: is <true>
> Feb 17 02:24:35 but: was <false>
> Feb 17 02:24:35 at
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> Feb 17 02:24:35 at org.junit.Assert.assertThat(Assert.java:964)
> Feb 17 02:24:35 at org.junit.Assert.assertThat(Assert.java:930)
> Feb 17 02:24:35 at
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.runTaskFailureRecoveryTest(JobRecoveryITCase.java:79)
> Feb 17 02:24:35 at
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase.testTaskFailureRecovery(JobRecoveryITCase.java:63)
> Feb 17 02:24:35 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> [...]
> {code}
> The actual cause is that unexpected data was received:
> {code}
> 02:24:35,301 [ Receiver (5/5)#1] WARN
> org.apache.flink.runtime.taskmanager.Task [] - Receiver
> (5/5)#1
> (d88e16a5e3c6f2c08cf3924d93ea18e2_28065fbb1d26fe99e018d3b846860dd3_4_1)
> switched from RUNNING to FAILED with failure cause:
> java.lang.Exception: Wrong data received.
> at
> org.apache.flink.runtime.jobmaster.TestingAbstractInvokables$Receiver.invoke(TestingAbstractInvokables.java:83)
> ~[test-classes/:?]
> at
> org.apache.flink.runtime.jobmaster.JobRecoveryITCase$FailingOnceReceiver.invoke(JobRecoveryITCase.java:126)
> ~[test-classes/:?]
> at
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
> ~[classes/:?]
> at
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
> [classes/:?]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
> [classes/:?]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
> [classes/:?]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)