tao meng created HUDI-1688:
------------------------------
Summary: hudi write should uncache rdd, when the write operation
is finnished
Key: HUDI-1688
URL: https://issues.apache.org/jira/browse/HUDI-1688
Project: Apache Hudi
Issue Type: Bug
Components: Spark Integration
Affects Versions: 0.7.0
Reporter: tao meng
Fix For: 0.8.0
now, hudi improve write performance by cache necessary rdds; however when the
write operation is finnished, those cached rdds have not been uncached which
waste lots of memory.
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115]
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214
In our environment:
step1: insert 100GB data into hudi table by spark (ok)
step2: insert another 100GB data into hudi table by spark again (oom )
--
This message was sent by Atlassian Jira
(v8.3.4#803005)