[ 
https://issues.apache.org/jira/browse/FLINK-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198727#comment-16198727
 ] 

Gabor Gevay commented on FLINK-1268:
------------------------------------

This issue just happened to me. I ran my job locally with parallelism 8, and 
then later with 4, and then I was debugging for an hour to figure out what went 
wrong.

> FileOutputFormat with overwrite does not clear local output directories
> -----------------------------------------------------------------------
>
>                 Key: FLINK-1268
>                 URL: https://issues.apache.org/jira/browse/FLINK-1268
>             Project: Flink
>          Issue Type: Bug
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: Till Rohrmann
>            Priority: Minor
>
> I noticed that the FileOutputFormat does not clear the output directories if 
> it writes to local disk. This has the consequence that previous partitions 
> are still contained in the directory if one decreases the DOP between 
> subsequent runs. If one reads the data from this directory, then more 
> partitions will be read in than were actually written. This can lead to a 
> wrong user code behaviour which is hard to debug. I'm aware that in case of a 
> distributed execution the TaskManagers or the Tasks have to be responsible 
> for the cleanup and if multiple Tasks are running on a TaskManager, then the 
> cleanup has to be coordinated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to