[ 
https://issues.apache.org/jira/browse/FLINK-28872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huang Xingbo closed FLINK-28872.
--------------------------------
    Resolution: Duplicate

> UnalignedCheckpointStressITCase fails with NoSuchFileException
> --------------------------------------------------------------
>
>                 Key: FLINK-28872
>                 URL: https://issues.apache.org/jira/browse/FLINK-28872
>             Project: Flink
>          Issue Type: Bug
>          Components: Test Infrastructure
>    Affects Versions: 1.16.0
>            Reporter: Lihe Ma
>            Priority: Minor
>         Attachments: mvn-3-1.log
>
>
> UnalignedCheckpointStressITCase fails occasionally.
> From the logs from one failed attempt, random configuration was set to :
>  *  false for taskmanager.network.memory.buffer-debloat.enabled
>  *  false for execution.checkpointing.unaligned
>  *  PT0.1S for execution.checkpointing.alignment-timeout
>  *  false for state.backend.changelog.enabled
> It failed when tried to fetch latest retained checkpoint.
> {code:java}
> Caused by: java.nio.file.NoSuchFileException: 
> /tmp/junit933601030800674266/ea67eac6fd3f8192f43aae35952a64e7/chk-6/6e234f61-ba86-4fc6-9739-980ae3edb682
>  {code}
> It tried to read chk-6, but chk-7 completed before job finished, but this 
> test case did not realize that chk-7 was the latest retained checkpoint.
> {code:java}
> 04:59:34,077 [    Checkpoint Timer] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
> checkpoint 6 (type=CheckpointType{name='Checkpoint', 
> sharingFilesStrategy=FORWARD_BACKWARD}) @ 1659848374076 for job 
> ea67eac6fd3f8192f43aae35952a64e7.
> 04:59:35,284 [jobmanager-io-thread-11] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
> checkpoint 6 for job ea67eac6fd3f8192f43aae35952a64e7 (15635521 bytes, 
> checkpointDuration=1208 ms, finalizationTime=0 ms).
> 04:59:35,285 [    Checkpoint Timer] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
> checkpoint 7 (type=CheckpointType{name='Checkpoint', 
> sharingFilesStrategy=FORWARD_BACKWARD}) @ 1659848375285 for job 
> ea67eac6fd3f8192f43aae35952a64e7.
> 04:59:35,950 [jobmanager-io-thread-10] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
> checkpoint 7 for job ea67eac6fd3f8192f43aae35952a64e7 (15775934 bytes, 
> checkpointDuration=665 ms, finalizationTime=0 ms).
> 04:59:35,951 [    Checkpoint Timer] INFO  
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
> checkpoint 8 (type=CheckpointType{name='Checkpoint', 
> sharingFilesStrategy=FORWARD_BACKWARD}) @ 1659848375951 for job 
> ea67eac6fd3f8192f43aae35952a64e7.
> 04:59:35,952 [Channel state writer Map -> Map (1/1)#0] INFO  
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl
>  [] - Map -> Map (1/1)#0 discarding 0 drained requests04:59:35,953 [  Map -> 
> Map (1/1)#0] WARN  org.apache.flink.runtime.taskmanager.Task                  
>   [] - Map -> Map (1/1)#0 
> (8e004c2ca00f22588638cc354972c8be_624c2fac4e5e1bf52e83f8a978720139_0_0) 
> switched from RUNNING to FAILED with failure cause: 
> org.apache.flink.runtime.operators.testutils.ExpectedTestException: 
> Record(sourceId=7, payload.length=4096, value=741) {code}
> this is a failed pipeline (not master branch),but we reproduce this bug on 
> master code locally.
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39472&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to