[
https://issues.apache.org/jira/browse/HUDI-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan reassigned HUDI-1652:
-----------------------------------------
Assignee: Balaji Varadarajan
> DiskBasedMap:As time goes by, the number of /temp/***** file handles held by
> the executor process is increasing
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-1652
> URL: https://issues.apache.org/jira/browse/HUDI-1652
> Project: Apache Hudi
> Issue Type: Bug
> Components: DeltaStreamer
> Affects Versions: 0.6.0
> Reporter: wangmeng
> Assignee: Balaji Varadarajan
> Priority: Major
> Labels: sev:critical, user-support-issues
>
> We encountered a problem in the hudi production environment, which is very
> similar to the HUDI-945 problem.
> *Software environment:* spark 2.4.5, hudi 0.6
> *Scenario:* consume Kafka data and write hudi, using spark streaming
> (non-StructedStreaming).
> *Problem:* As time goes by, the number of /temp/***** file handles held by
> the executor process is increasing.
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
> /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
> /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
> *Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class,
> and DiskBasedMap is used to flush overflowed data to the disk. But the file
> stream can only be closed and deleted by the hook when the jvm exits. When
> the clear method is executed in the program, the stream is not closed and the
> file is not deleted. As a result, over time, more and more file handles are
> still held, leading to errors. This error is similar to Hudi-945.
>
> *软件环境:*spark 2.4.5、hudi 0.6
> *场景:*消费kafka数据写入hudi,采用spark streaming(非StructedStreaming)。
> *问题:executor 进程随着时间的推移,所持有的/temp/*****文件句柄数越来越多。
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
> /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
> /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
> *原因分析:*HoodieMergeHandle类中采用ExternalSpillableMap,使用DiskBasedMap将溢出的数据刷新到磁盘上。但是文件流只有在jvm退出的时候通过钩子关闭且删除文件。程序中执行clear方法时,并不关闭流及删除文件。从而导致随着时间推移,越来越多的文件句柄还持有,导致报错。此错误和Hudi-945挺相似的。
--
This message was sent by Atlassian Jira
(v8.3.4#803005)