[ 
https://issues.apache.org/jira/browse/FLINK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495849#comment-17495849
 ] 

Yue Ma commented on FLINK-26196:
--------------------------------

[~hjw] 
It may not only be exception information, can you post the complete log 
corresponding to JM and TM .For example, whether the Checkpoint has been 
successful in the past few times, and whether a failover has occurred
I feel that there is too little information available at the moment

> error when Incremental Checkpoints  by RocksDb 
> -----------------------------------------------
>
>                 Key: FLINK-26196
>                 URL: https://issues.apache.org/jira/browse/FLINK-26196
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / State Backends
>    Affects Versions: 1.13.2
>            Reporter: hjw
>            Priority: Critical
>         Attachments: image-2022-02-22-12-02-46-835.png
>
>
> When I use Incremental Checkpoints by RocksDb , errors happen occasionally. 
> Fortunately,Flink job is running normally
> Log:
> {code:java}
> java.io.IOException: Could not perform checkpoint 2804 for operator 
> cc-rule-keyByAndReduceStream (2/8)#1.
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158)
>  at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
>  at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
> complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. 
> Failure reason: Checkpoint was declined.
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169)
>  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029)
>  ... 19 more
> Caused by: org.rocksdb.RocksDBException: while link file to 
> /opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/chk-2804.tmp/000279.sst:
>  
> /opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/db/000279.sst:
>  File exists
>  at org.rocksdb.Checkpoint.createCheckpoint(Native Method)
>  at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51)
>  at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.takeDBNativeCheckpoint(RocksIncrementalSnapshotStrategy.java:288)
>  at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.syncPrepareResources(RocksIncrementalSnapshotStrategy.java:157)
>  at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.syncPrepareResources(RocksIncrementalSnapshotStrategy.java:83)
>  at 
> org.apache.flink.runtime.state.SnapshotStrategyRunner.snapshot(SnapshotStrategyRunner.java:77)
>  at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:551)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:241)
>  ... 29 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to