Adam Binford created SPARK-44639:
------------------------------------

             Summary: Add option to use Java tmp dir for RocksDB state store
                 Key: SPARK-44639
                 URL: https://issues.apache.org/jira/browse/SPARK-44639
             Project: Spark
          Issue Type: Improvement
          Components: SQL, Structured Streaming
    Affects Versions: 3.4.1
            Reporter: Adam Binford


Currently local RocksDB state is stored in a local directory given by 
Utils.getLocalDir. On yarn this is a directory created inside the root 
application folder such as

{{/tmp/nm-local-dir/usercache/<user>/appcache/<app_id>/}}

The problem with this is that if an executor crashes for some reason (like OOM) 
and the shutdown hooks don't get run, this directory will stay around forever 
until the application finishes, which can cause jobs to slowly accumulate more 
and more temporary space until finally the node manager goes unhealthy.

Because this data will only ever be accessed by the executor that created this 
directory, it would make sense to store the data inside the container folder, 
which will always get cleaned up by the node manager when that yarn container 
gets cleaned up. Yarn sets the `java.io.tmpdir` to a path inside this 
directory, such as

{{/tmp/nm-local-dir/usercache/<user>/appcache/<app_id>/<container_id>/tmp/}}

I'm not sure the behavior for other resource managers, so this could be an 
opt-in config that can be specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to