[
https://issues.apache.org/jira/browse/SPARK-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-8966:
-------------------------------
Parent Issue: SPARK-9697 (was: SPARK-9565)
> Design a mechanism to ensure that temporary files created in tasks are
> cleaned up after failures
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-8966
> URL: https://issues.apache.org/jira/browse/SPARK-8966
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core
> Reporter: Josh Rosen
>
> It's important to avoid leaking temporary files, such as spill files created
> by the external sorter. Individual operators should still make an effort to
> clean up their own files / perform their own error handling, but I think that
> we should add a safety-net mechanism to track file creation on a per-task
> basis and automatically clean up leaked files.
> During tests, this mechanism should throw an exception when a leak is
> detected. In production deployments, it should log a warning and clean up the
> leak itself. This is similar to the TaskMemoryManager's leak detection and
> cleanup code.
> We may be able to implement this via a convenience method that registers task
> completion handlers with TaskContext.
> We might also explore techniques that will cause files to be cleaned up
> automatically when their file descriptors are closed (e.g. by calling unlink
> on an open file). These techniques should not be our last line of defense
> against file resource leaks, though, since they might be platform-specific
> and may clean up resources later than we'd like.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]