[ 
https://issues.apache.org/jira/browse/HUDI-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752332#comment-17752332
 ] 

Xinglong Wang commented on HUDI-3425:
-------------------------------------

I have encountered the same problem. I am using Flink on Yarn. When the job 
executes compaction but encounters an abnormal situation (for example, 
container is running beyond physical memory limits or other exceptions) and 
performs a full-restart, if `HoodieMergedLogRecordScanner` is still scanning 
log files at this time, and `ExternalSpillableMap#close()` is not executed to 
clean up, resulting in the accumulation of spillable map files in the /tmp 
directory, and eventually the disk is exhausted.
Now I set `hoodie.memory.spillable.map.path` to the `$PWD/spillable-map/` 
directory when Yarn container launches, environment variable `PWD` is exported 
in `launch_container.sh`, so that the spillable map files will be cleaned up 
when the container is closed.

> Clean up spill path created by Hudi during uneventful shutdown
> --------------------------------------------------------------
>
>                 Key: HUDI-3425
>                 URL: https://issues.apache.org/jira/browse/HUDI-3425
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: compaction
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>             Fix For: 0.12.1
>
>
> h1. Hudi spill path not getting cleared when containers getting killed 
> abruptly. 
>  
> When yarn kills the containers abruptly for any reason while hudi stage is in 
> progress then the spill path created by hudi on the disk is not cleaned and 
> as a result of which the nodes on the cluster start running out of space. We 
> need to clear the spill path manually to free out disk space.
>  
> Ref issue: https://github.com/apache/hudi/issues/4771



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to