[ 
https://issues.apache.org/jira/browse/FLINK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183874#comment-17183874
 ] 

Yun Gao commented on FLINK-18592:
---------------------------------

Hi [~ALVINWJ] [~liyu] How do we plan to fix this issue ? It seems that we 
always have chances that the recover do not success in a long time. 

And [~ALVINWJ] I'am wondering if restore from more earlier checkpoints could 
always succeed since they are not consistent with the files in the FileSystem. 
For example, there may be files get written and committed in the latter 
checkpoints but after storing, we could not remove them, thus cause data 
duplication.

> StreamingFileSink fails due to truncating HDFS file failure
> -----------------------------------------------------------
>
>                 Key: FLINK-18592
>                 URL: https://issues.apache.org/jira/browse/FLINK-18592
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.10.1
>            Reporter: JIAN WANG
>            Priority: Major
>             Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> I meet the issue on flink-1.10.1. I use flink on YARN(3.0.0-cdh6.3.2) with 
> StreamingFileSink. 
> code part like this:
> {code}
>       public static <IN> StreamingFileSink<IN> build(String dir, 
> BucketAssigner<IN, String> assigner, String prefix) {
>               return StreamingFileSink.forRowFormat(new Path(dir), new 
> SimpleStringEncoder<IN>())
>                       .withRollingPolicy(
>                               DefaultRollingPolicy
>                                       .builder()
>                                       
> .withRolloverInterval(TimeUnit.HOURS.toMillis(2))
>                                       
> .withInactivityInterval(TimeUnit.MINUTES.toMillis(10))
>                                       .withMaxPartSize(1024L * 1024L * 1024L 
> * 50) // Max 50GB
>                                       .build())
>                       .withBucketAssigner(assigner)
>                       
> .withOutputFileConfig(OutputFileConfig.builder().withPartPrefix(prefix).build())
>                       .build();
>       }
> {code}
> The error is 
> {noformat}
> java.io.IOException:
> Problem while truncating file:
> hdfs:///business_log/hashtag/2020-06-25/.hashtag-122-37.inprogress.8e65f69c-b5ba-4466-a844-ccc0a5a93de2
> {noformat}
> Due to this issue, it can not restart from the latest checkpoint and 
> savepoint.
> Currently, my workaround is that we keep latest 3 checkpoint, and if it 
> fails, I manually restart from penult checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to