Josh Rosen created SPARK-8966:
---------------------------------

             Summary: Design a mechanism to ensure that temporary files created 
in tasks are cleaned up after failures
                 Key: SPARK-8966
                 URL: https://issues.apache.org/jira/browse/SPARK-8966
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Josh Rosen


It's important to avoid leaking temporary files, such as spill files created by 
the external sorter.  Individual operators should still make an effort to clean 
up their own files / perform their own error handling, but I think that we 
should add a safety-net mechanism to track file creation on a per-task basis 
and automatically clean up leaked files.

During tests, this mechanism should throw an exception when a leak is detected. 
In production deployments, it should log a warning and clean up the leak 
itself.  This is similar to the TaskMemoryManager's leak detection and cleanup 
code.

We may be able to implement this via a convenience method that registers task 
completion handlers with TaskContext.

We might also explore techniques that will cause files to be cleaned up 
automatically when their file descriptors are closed (e.g. by calling unlink on 
an open file). These techniques should not be our last line of defense against 
file resource leaks, though, since they might be platform-specific and may 
clean up resources later than we'd like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to