[ https://issues.apache.org/jira/browse/FLINK-24163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478366#comment-17478366 ]
Yun Gao edited comment on FLINK-24163 at 1/19/22, 7:18 AM: ----------------------------------------------------------- This seems to be due to different reason. Hi [~roman] , by binary search it seems with https://issues.apache.org/jira/browse/FLINK-25395 the running time of PartiallyFinishedSourcesITCase#test[complex graph SINGLE_SUBTASK, failover: true, strategy: region] has increased from 2s to about 1 minute, the case is blocked on restoring state after failover: {code:java} "transform-2-keyed (1/4)#1" #1517 prio=5 os_prio=31 tid=0x00007f862136a000 nid=0x10423 runnable [0x0000700011fee000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:73) at org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:60) at org.apache.flink.runtime.util.ForwardingInputStream.read(ForwardingInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:68) at com.esotericsoftware.kryo.io.Input.fill(Input.java:146) at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:77) at com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:598) at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:576) at com.esotericsoftware.kryo.io.Input.readString(Input.java:454) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:177) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:166) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:402) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:78) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$$Lambda$2196/1169355256.readElement(Unknown Source) at org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:258) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:220) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:166) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:62) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:169) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106) at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:137) {code} Could you help to have a look~? The commit before the PR on the master branch is 265a0a0708ae743c63505bb02e0659984a565fbb and the commit right after the PR is 4691b66545010ed812624a259869c7a522663720 . was (Author: gaoyunhaii): This seems to be due to different reason. Hi [~roman] , by binary search it seems with https://issues.apache.org/jira/browse/FLINK-25395 the running time of PartiallyFinishedSourcesITCase#test[complex graph SINGLE_SUBTASK, failover: true, strategy: region] has increased from 2s to about 1 minute, the case is blocked on restoring state after failover: {code:java} "transform-2-keyed (1/4)#1" #1517 prio=5 os_prio=31 tid=0x00007f862136a000 nid=0x10423 runnable [0x0000700011fee000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:73) at org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:60) at org.apache.flink.runtime.util.ForwardingInputStream.read(ForwardingInputStream.java:52) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:68) at com.esotericsoftware.kryo.io.Input.fill(Input.java:146) at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:77) at com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:598) at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:576) at com.esotericsoftware.kryo.io.Input.readString(Input.java:454) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:177) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:166) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:402) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:78) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$$Lambda$2196/1169355256.readElement(Unknown Source) at org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:258) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:220) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:166) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:62) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:169) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106) at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:137) {code} Could you help to have a look~? > PartiallyFinishedSourcesITCase fails due to timeout > --------------------------------------------------- > > Key: FLINK-24163 > URL: https://issues.apache.org/jira/browse/FLINK-24163 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.14.0, 1.15.0 > Reporter: Xintong Song > Assignee: Yun Gao > Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.14.0, 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23529&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=10996 > {code} > Sep 04 04:35:28 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 155.236 s <<< FAILURE! - in > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase > Sep 04 04:35:28 [ERROR] test[complex graph ALL_SUBTASKS, failover: false] > Time elapsed: 65.999 s <<< ERROR! > Sep 04 04:35:28 java.util.concurrent.TimeoutException: Condition was not met > in given timeout. > Sep 04 04:35:28 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:164) > Sep 04 04:35:28 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:142) > Sep 04 04:35:28 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:134) > Sep 04 04:35:28 at > org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:297) > Sep 04 04:35:28 at > org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:219) > Sep 04 04:35:28 at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:82) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)