wangmeng created HUDI-1652:
------------------------------
Summary: DiskBasedMap:As time goes by, the number of /temp/*****
file handles held by the executor process is increasing
Key: HUDI-1652
URL: https://issues.apache.org/jira/browse/HUDI-1652
Project: Apache Hudi
Issue Type: Bug
Components: DeltaStreamer
Affects Versions: 0.6.0
Reporter: wangmeng
We encountered a problem in the hudi production environment, which is very
similar to the HUDI-945 problem.
*Software environment:* spark 2.4.5, hudi 0.6
*Scenario:* consume Kafka data and write hudi, using spark streaming
(non-StructedStreaming).
*Problem:* As time goes by, the number of /temp/***** file handles held by the
executor process is increasing.
"
/tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
/tmp/49251680-0efd-4cc4-a55e-1af2038d3900
/tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
"
*Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class, and
DiskBasedMap is used to flush overflowed data to the disk. But the file stream
can only be closed and deleted by the hook when the jvm exits. When the clear
method is executed in the program, the stream is not closed and the file is not
deleted. As a result, over time, more and more file handles are still held,
leading to errors. This error is similar to Hudi-945.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)