tao meng created HUDI-1688:
------------------------------

             Summary: hudi write should uncache rdd, when the write operation 
is finnished
                 Key: HUDI-1688
                 URL: https://issues.apache.org/jira/browse/HUDI-1688
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.7.0
            Reporter: tao meng
             Fix For: 0.8.0


now, hudi improve write performance by cache necessary rdds; however when the 
write operation is finnished, those cached rdds have not been uncached which 
waste lots of memory.

[https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115]

https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214

In our environment:

step1: insert 100GB data into hudi table by spark   (ok)

step2: insert another 100GB data into hudi table by spark again (oom ) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to