[ 
https://issues.apache.org/jira/browse/FLINK-25902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan closed FLINK-25902.
-------------------------------------
    Resolution: Duplicate

> NullPointerException in RescalingITCase.testSavepointRescalingOutKeyedState
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-25902
>                 URL: https://issues.apache.org/jira/browse/FLINK-25902
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>         Attachments: RescalingITCase.testSavepointRescalingOutKeyedState.log
>
>
> I experienced a [build 
> failure|https://dev.azure.com/mapohl/flink/_build/results?buildId=659&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1&l=10189]
>  in {{RescalingITCase.testSavepointRescalingOutKeyedState}} with a 
> {{NullPointerException}} appearing:
> {code:java}
> 12:17:36,702 [AsyncOperations-thread-1] INFO  
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - Flat 
> Map -> Sink: Unnamed (4/4)#0 - asynchronous part of checkpoint 5 could not be 
> completed.
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint was 
> canceled because a barrier from newer checkpoint was received.
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> ~[?:1.8.0_292]
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
> ~[?:1.8.0_292]
>         at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:69)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
>  [flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_292]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_292]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: 
> Checkpoint was canceled because a barrier from newer checkpoint was received.
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.abortInternal(SingleCheckpointBarrierHandler.java:376)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.cancelSubsumedCheckpoint(SingleCheckpointBarrierHandler.java:463)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.checkNewCheckpoint(SingleCheckpointBarrierHandler.java:347)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:228)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:181)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.processPriorityEvents(CheckpointedInputGate.java:112)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
> ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:802)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:751)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>  ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         ... 1 more
> 12:17:36,707 [Flat Map -> Sink: Unnamed (1/4)#0] INFO  
> org.apache.flink.state.changelog.ChangelogKeyedStateBackend  [] - snapshot of 
> Flat Map -> Sink: Unnamed (1/4)#0 for checkpoint 6, change range: 0..13306
> 12:17:36,710 [Flat Map -> Sink: Unnamed (4/4)#0] INFO  
> org.apache.flink.state.changelog.ChangelogKeyedStateBackend  [] - snapshot of 
> Flat Map -> Sink: Unnamed (4/4)#0 for checkpoint 6, change range: 0..9002
> 12:17:36,720 [Flat Map -> Sink: Unnamed (4/4)#0] INFO  
> org.apache.flink.state.changelog.PeriodicMaterializationManager [] - Shutting 
> down PeriodicMaterializationManager.
> 12:17:36,720 [AsyncOperations-thread-1] INFO  
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - Flat 
> Map -> Sink: Unnamed (4/4)#0 - asynchronous part of checkpoint 6 could not be 
> completed.
> java.util.concurrent.CancellationException: null
>         at 
> java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2276) 
> ~[?:1.8.0_292]
>         at 
> org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:78)
>  ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFutures.lambda$cancel$0(OperatorSnapshotFutures.java:173)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.shaded.guava30.com.google.common.io.Closer.close(Closer.java:213)
>  ~[flink-shaded-guava-30.1.1-jre-14.0.jar:30.1.1-jre-14.0]
>         at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotFutures.cancel(OperatorSnapshotFutures.java:185)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.cleanup(AsyncCheckpointRunnable.java:391)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.close(AsyncCheckpointRunnable.java:356)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:294) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:281) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.close(SubtaskCheckpointCoordinatorImpl.java:480)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:294) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:281) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.core.fs.CloseableRegistry.doClose(CloseableRegistry.java:74) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.util.AbstractAutoCloseableRegistry.close(AbstractAutoCloseableRegistry.java:127)
>  ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.util.IOUtils.closeAll(IOUtils.java:254) 
> ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.core.fs.AutoCloseableRegistry.doClose(AutoCloseableRegistry.java:72)
>  ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.util.AbstractAutoCloseableRegistry.close(AbstractAutoCloseableRegistry.java:127)
>  ~[flink-core-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUp(StreamTask.java:914)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.lambda$restoreAndInvoke$0(Task.java:930)
>  ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>  ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:930) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 12:17:36,722 [Channel state writer Flat Map -> Sink: Unnamed (4/4)#0] INFO  
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl
>  [] - Flat Map -> Sink: Unnamed (4/4)#0 discarding 0 drained requests
> 12:17:36,722 [Flat Map -> Sink: Unnamed (4/4)#0] INFO  
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl
>  [] - Flat Map -> Sink: Unnamed (4/4)#0 discarding 1 drained requests
> 12:17:36,723 [Flat Map -> Sink: Unnamed (4/4)#0] WARN  
> org.apache.flink.runtime.taskmanager.Task                    [] - Flat Map -> 
> Sink: Unnamed (4/4)#0 (185244134ae888669016a5f2b4282d26) switched from 
> RUNNING to FAILED with failure cause: java.lang.Nu
> llPointerException
>         at 
> org.apache.flink.state.changelog.ChangelogKeyedStateBackend.notifyCheckpointAborted(ChangelogKeyedStateBackend.java:536)
>         at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.notifyCheckpointAborted(StreamOperatorStateHandler.java:298)
>         at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.notifyCheckpointAborted(AbstractStreamOperator.java:383)
>         at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointAborted(AbstractUdfStreamOperator.java:132)
>         at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointAborted(RegularOperatorChain.java:158)
>         at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:406)
>         at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointAborted(SubtaskCheckpointCoordinatorImpl.java:352)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointAbortAsync$15(StreamTask.java:1327)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$17(StreamTask.java:1350)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:802)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:751)
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>         at java.lang.Thread.run(Thread.java:748)12:17:36,723 [Flat Map -> 
> Sink: Unnamed (4/4)#0] INFO  org.apache.flink.runtime.taskmanager.Task        
>             [] - Freeing task resources for Flat Map -> Sink: Unnamed (4/4)#0 
> (185244134ae888669016a5f2b4282d26).
> 12:17:36,727 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
> Un-registering task and sending final execution state FAILED to JobManager 
> for task Flat Map -> Sink: Unnamed (4/4)#0 185244134ae
> 888669016a5f2b4282d26.
> 12:17:36,728 [flink-akka.actor.default-dispatcher-8] INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Flat Map -> 
> Sink: Unnamed (4/4) (185244134ae888669016a5f2b4282d26) switched from RUNNING 
> to FAILED on 04d463f4-6c87-415e-b8ad-dd4
> 20195d7cf @ localhost (dataPort=40473).
> java.lang.NullPointerException: null
>         at 
> org.apache.flink.state.changelog.ChangelogKeyedStateBackend.notifyCheckpointAborted(ChangelogKeyedStateBackend.java:536)
>  ~[flink-statebackend-changelog-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.notifyCheckpointAborted(StreamOperatorStateHandler.java:298)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.notifyCheckpointAborted(AbstractStreamOperator.java:383)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointAborted(AbstractUdfStreamOperator.java:132)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointAborted(RegularOperatorChain.java:158)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:406)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointAborted(SubtaskCheckpointCoordinatorImpl.java:352)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointAbortAsync$15(StreamTask.java:1327)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$17(StreamTask.java:1350)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
> ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:802)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:751)
>  ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>  ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
> ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]{code}
> The branch on which that test is failing does work on checkpoint-related 
> stuff in the sense that it recovers checkpoints (FLIP-194; JobResultStore 
> efforts). But it does not reinstantiate the {{CheckpointCoordinator}} which 
> leaves me with the suspicion that there's something else going wrong. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to