[
https://issues.apache.org/jira/browse/FLINK-24163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478366#comment-17478366
]
Yun Gao edited comment on FLINK-24163 at 1/19/22, 7:18 AM:
-----------------------------------------------------------
This seems to be due to different reason.
Hi [~roman] , by binary search it seems with
https://issues.apache.org/jira/browse/FLINK-25395 the running time of
PartiallyFinishedSourcesITCase#test[complex graph SINGLE_SUBTASK, failover:
true, strategy: region] has increased from 2s to about 1 minute, the case is
blocked on restoring state after failover:
{code:java}
"transform-2-keyed (1/4)#1" #1517 prio=5 os_prio=31 tid=0x00007f862136a000
nid=0x10423 runnable [0x0000700011fee000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at
org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:73)
at
org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:60)
at
org.apache.flink.runtime.util.ForwardingInputStream.read(ForwardingInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:68)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:146)
at
org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:77)
at com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:598)
at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:576)
at com.esotericsoftware.kryo.io.Input.readString(Input.java:454)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:177)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:166)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730)
at
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:402)
at
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:78)
at
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$$Lambda$2196/1169355256.readElement(Unknown
Source)
at
org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:258)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:220)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:166)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:62)
at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:169)
at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106)
at
org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:137)
{code}
Could you help to have a look~?
The commit before the PR on the master branch is
265a0a0708ae743c63505bb02e0659984a565fbb and the commit right after the PR is
4691b66545010ed812624a259869c7a522663720 .
was (Author: gaoyunhaii):
This seems to be due to different reason.
Hi [~roman] , by binary search it seems with
https://issues.apache.org/jira/browse/FLINK-25395 the running time of
PartiallyFinishedSourcesITCase#test[complex graph SINGLE_SUBTASK, failover:
true, strategy: region] has increased from 2s to about 1 minute, the case is
blocked on restoring state after failover:
{code:java}
"transform-2-keyed (1/4)#1" #1517 prio=5 os_prio=31 tid=0x00007f862136a000
nid=0x10423 runnable [0x0000700011fee000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at
org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:73)
at
org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:60)
at
org.apache.flink.runtime.util.ForwardingInputStream.read(ForwardingInputStream.java:52)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:68)
at com.esotericsoftware.kryo.io.Input.fill(Input.java:146)
at
org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:77)
at com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:598)
at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:576)
at com.esotericsoftware.kryo.io.Input.readString(Input.java:454)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:177)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:166)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730)
at
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:402)
at
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:78)
at
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$$Lambda$2196/1169355256.readElement(Unknown
Source)
at
org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:258)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:220)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:166)
at
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:62)
at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:169)
at
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106)
at
org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:137)
{code}
Could you help to have a look~?
> PartiallyFinishedSourcesITCase fails due to timeout
> ---------------------------------------------------
>
> Key: FLINK-24163
> URL: https://issues.apache.org/jira/browse/FLINK-24163
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Affects Versions: 1.14.0, 1.15.0
> Reporter: Xintong Song
> Assignee: Yun Gao
> Priority: Blocker
> Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23529&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=10996
> {code}
> Sep 04 04:35:28 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0,
> Time elapsed: 155.236 s <<< FAILURE! - in
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase
> Sep 04 04:35:28 [ERROR] test[complex graph ALL_SUBTASKS, failover: false]
> Time elapsed: 65.999 s <<< ERROR!
> Sep 04 04:35:28 java.util.concurrent.TimeoutException: Condition was not met
> in given timeout.
> Sep 04 04:35:28 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:164)
> Sep 04 04:35:28 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:142)
> Sep 04 04:35:28 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:134)
> Sep 04 04:35:28 at
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:297)
> Sep 04 04:35:28 at
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:219)
> Sep 04 04:35:28 at
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:82)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)