[
https://issues.apache.org/jira/browse/FLINK-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714523#comment-16714523
]
Stefan Richter commented on FLINK-10844:
----------------------------------------
Hi, sorry for the late reply. I cannot answer that question with the given
information. I would assume the problem is in the code that defines or modifies
the key objects for your keyed state. Did you have any new insights on this
issue in the meantime?
> ArrayIndexOutOfBoundsException during checkpoint (Flink 1.5.2)
> --------------------------------------------------------------
>
> Key: FLINK-10844
> URL: https://issues.apache.org/jira/browse/FLINK-10844
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Environment: Flink 1.5.2
> AWS EMR 5.17.0
> Reporter: Daniel Harper
> Priority: Minor
>
> We have seen this exception a few times in our production environment during
> checkpoints, which causes the job to restart
> {code}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint
> 2426 for operator
> x.x.x.x.streaming.pipeline.transformations.concurrentstreams.ConcurrentStreamsAggregator
> PERFORM COUNT DISTINCT OVER UUIDS FOR KEY ->
> ParDo(ToConcurrentStreamsResult)/ParMultiDo(ToConcurrentStreamsResult) ->
> JdbcIO.Write/ParDo(Write)/ParMultiDo(Write) (28/32).}
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2426 for
> operator
> x.x.x.x.streaming.pipeline.transformations.concurrentstreams.ConcurrentStreamsAggregator
> PERFORM COUNT DISTINCT OVER UUIDS FOR KEY ->
> ParDo(ToConcurrentStreamsResult)/ParMultiDo(ToConcurrentStreamsResult) ->
> JdbcIO.Write/ParDo(Write)/ParMultiDo(Write) (28/32).
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
> ... 6 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: -22
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
> ... 5 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -22
> at
> org.apache.flink.runtime.state.heap.CopyOnWriteStateTableSnapshot.partitionEntriesByKeyGroup(CopyOnWriteStateTableSnapshot.java:162)
> at
> org.apache.flink.runtime.state.heap.CopyOnWriteStateTableSnapshot.writeMappingsInKeyGroup(CopyOnWriteStateTableSnapshot.java:178)
> at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:697)
> at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:641)
> at
> org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
> ... 7 more
> {code}
> Looks like the index being computed here:
> https://github.com/apache/flink/blob/release-1.5.2/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/CopyOnWriteStateTableSnapshot.java#L161
> Is returning -22, causing the AOOB exception to fire
> https://github.com/apache/flink/blob/release-1.5.2/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/CopyOnWriteStateTableSnapshot.java#L162
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)