[ 
https://issues.apache.org/jira/browse/FLINK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494456#comment-17494456
 ] 

Gen Luo commented on FLINK-26233:
---------------------------------

This is because when the compaction is switch from on to off, the pending files 
remained in the compactor state will be flushed to the committer at the first 
checkpoint after restarting. So the state of the committer can be larger than 
the stable status, which in this test case is more than 5MB.

I have created a pr to fix this by reducing the speed of the source and using 
the FileSystemCheckpointStorage.

> FileSinkCompactionSwitchITCase.testSwitchingCompaction() fails in CI
> --------------------------------------------------------------------
>
>                 Key: FLINK-26233
>                 URL: https://issues.apache.org/jira/browse/FLINK-26233
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.15.0
>            Reporter: Alexander Fedulov
>            Priority: Major
>              Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-02-17T20:13:20.2895110Z Feb 17 20:13:20 [INFO] Running 
> org.apache.flink.connector.file.sink.writer.FileSinkMigrationITCase
> 2022-02-17T20:13:40.2160260Z Feb 17 20:13:40 [INFO] Tests run: 2, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 19.905 s - in 
> org.apache.flink.connector.file.sink.writer.FileSinkMigrationITCase
> 2022-02-17T20:13:58.8860609Z Feb 17 20:13:58 [ERROR] Tests run: 2, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 102.488 s <<< FAILURE! - in 
> org.apache.flink.connector.file.sink.FileSinkCompactionSwitchITCase
> 2022-02-17T20:13:58.8864562Z Feb 17 20:13:58 [ERROR] 
> FileSinkCompactionSwitchITCase.testSwitchingCompaction  Time elapsed: 37.28 s 
>  <<< ERROR!
> 2022-02-17T20:13:58.8865526Z Feb 17 20:13:58 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2022-02-17T20:13:58.8866319Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> 2022-02-17T20:13:58.8867102Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:934)
> 2022-02-17T20:13:58.8867985Z Feb 17 20:13:58     at 
> org.apache.flink.connector.file.sink.FileSinkCompactionSwitchITCase.testSwitchingCompaction(FileSinkCompactionSwitchITCase.java:175)
> [...]
> 2022-02-17T20:13:58.8919634Z Feb 17 20:13:58 Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> [...]
> 2022-02-17T20:13:58.8939468Z Feb 17 20:13:58 Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable 
> failure threshold.
> 2022-02-17T20:13:58.8940119Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.checkFailureAgainstCounter(CheckpointFailureManager.java:160)
> 2022-02-17T20:13:58.8940863Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleTaskLevelCheckpointException(CheckpointFailureManager.java:145)
> 2022-02-17T20:13:58.8941613Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:97)
> 2022-02-17T20:13:58.8942321Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2046)
> 2022-02-17T20:13:58.8943011Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:1040)
> 2022-02-17T20:13:58.8943830Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$declineCheckpoint$2(ExecutionGraphHandler.java:103)
> 2022-02-17T20:13:58.8944567Z Feb 17 20:13:58     at 
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
> 2022-02-17T20:13:58.8945240Z Feb 17 20:13:58     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2022-02-17T20:13:58.8945794Z Feb 17 20:13:58     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2022-02-17T20:13:58.8946276Z Feb 17 20:13:58     at 
> java.lang.Thread.run(Thread.java:748)  {code}
> https://dev.azure.com/alexanderfedulov/Flink/_build/results?buildId=37&view=logs&j=dafbab6d-4616-5d7b-ee37-3c54e4828fd7&t=e204f081-e6cd-5c04-4f4c-919639b63be9&l=11112



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to