GitHub user BJangir opened a pull request:

    https://github.com/apache/carbondata/pull/2944

    [CARBONDATA-3122]CarbonReader memory leak

    **Issue  Detail**
     CarbonReader  has List of initialized RecordReader for each Split  and 
each split holds page data till the reference of RecordReader is present in the 
List . Same is applicable for GC once user comes out from his/her calling 
method ( not cleaned even in `close()` ) but till then from each split , last 
page will be in memory which is not correct.  For ex.  if 1K carbon files then 
last page ( ~32K * 100 ,size if 100 String columns in memory ) of each file 
will be in memory till last split so total ~3GB memory will be occupied ( 1K * 
32K * 100 . 
    Check heap dump of 3 split after `reader.close()` is called ,It is be seen 
that currentreader+all list reader are still holding memory. 
    
![image](https://user-images.githubusercontent.com/12861989/48916831-e09bf100-eea9-11e8-9b58-7a4ed572d72e.png)
    
    
![image](https://user-images.githubusercontent.com/12861989/48917034-d29aa000-eeaa-11e8-8683-666f6f6e57c9.png)
    
    
    **Solution** 
    1. Once reader is finished assign `currentReader` to `null` in RecordReader 
List.  
    OR 
    2. Assign future object as `null` in 
org.apache.carbondata.core.scan.processor.DataBlockIterator#close()
     Solution 2 is adopted so that it will give benefit  to other than 
CarbonReader Flow. 
    
    **After Fix** 
    
    
![image](https://user-images.githubusercontent.com/12861989/48917009-bd257600-eeaa-11e8-85f6-9e69bdda1908.png)
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     NA
     - [ ] Any backward compatibility impacted?
     NA
     - [ ] Document update required?
    NA
     - [ ] Testing done
           Manual Test
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    
    NA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BJangir/incubator-carbondata reader_mem_leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2944
    
----
commit 198c042251f1269a75de51d36d42e5bcd23fe651
Author: BJangir <babulaljangir111@...>
Date:   2018-11-22T17:04:32Z

    [CARBONDATA-3122]CarbonReader memory leak

----


---

Reply via email to