[
https://issues.apache.org/jira/browse/FLINK-27687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gaël Renoux updated FLINK-27687:
--------------------------------
Summary: Flink shouldn't assume temp folders keep existing when unused
(was: SpanningWrapper shouldn't assume temp folder exists)
> Flink shouldn't assume temp folders keep existing when unused
> -------------------------------------------------------------
>
> Key: FLINK-27687
> URL: https://issues.apache.org/jira/browse/FLINK-27687
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Network
> Affects Versions: 1.14.4
> Reporter: Gaël Renoux
> Priority: Major
>
> In SpanningWrapper.createSpillingChannel, it assumes that the folder in which
> we create the file exists. However, this is not the case in the following
> scenario (which actually happened to us today):
> * The temp folders were created a while ago (I assume on startup of the
> task-manager) in the /tmp folder. They weren't used for a while, probably
> because we didn't have any record big enough to trigger it.
> * The cleanup cron for /tmp did its job and deleted those old folders in
> /tmp.
> * We deployed a new version of the job that actually needed the folders, and
> it crashed.
> => Not sure if it should be SpanningWrapper's responsability to create the
> folder if it doesn't exist anymore, though, but I'm not familiar enough with
> Flink's internal to make a guess as to what class should do it. The problem
> occurred to us on SpanningWrapper, but it can probably happen in other places
> as well.
> More generally, assuming that folders and files in /tmp won't get deleted at
> some point doesn't seem correct to me. The [documentation for
> io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/]
> recommands that it shouldn't be purged, but we do need to clean up at some
> point. If that is not the case, then the documentation should be updated to
> indicate that this is not a recommendation but mandatory, and that purges
> will break the jobs (not just trigger a recovery).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)