wangmeng created HUDI-1652:
------------------------------

             Summary: DiskBasedMap:As time goes by, the number of /temp/***** 
file handles held by the executor process is increasing
                 Key: HUDI-1652
                 URL: https://issues.apache.org/jira/browse/HUDI-1652
             Project: Apache Hudi
          Issue Type: Bug
          Components: DeltaStreamer
    Affects Versions: 0.6.0
            Reporter: wangmeng


We encountered a problem in the hudi production environment, which is very 
similar to the HUDI-945 problem.
*Software environment:* spark 2.4.5, hudi 0.6
*Scenario:* consume Kafka data and write hudi, using spark streaming 
(non-StructedStreaming).
*Problem:* As time goes by, the number of /temp/***** file handles held by the 
executor process is increasing.

"

/tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
/tmp/49251680-0efd-4cc4-a55e-1af2038d3900
/tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9

"
*Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class, and 
DiskBasedMap is used to flush overflowed data to the disk. But the file stream 
can only be closed and deleted by the hook when the jvm exits. When the clear 
method is executed in the program, the stream is not closed and the file is not 
deleted. As a result, over time, more and more file handles are still held, 
leading to errors. This error is similar to Hudi-945.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to