[ 
https://issues.apache.org/jira/browse/HBASE-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106413#comment-15106413
 ] 

deepankar commented on HBASE-15101:
-----------------------------------

I thought before HBASE-13082, when a compaction starts and before it completes 
the files are present in .tmp directory (of the region folder) and finalized 
once it completes giving a very small window (after moving in the files from 
.tmp and moving out files from RegionServer) where there could be that all 
files are present. This is not the case after HBASE-13082 because both the set 
of files are present in the folder for a longer period of time and if there is 
any leak in the reference counting then all the files co exist and it can lead 
to a region size explosion . 

This is what exactly happened with us, without this patch we were running one 
regionserver with HBASE-13082 and almost all the regions on that server had all 
the files from the time of begining of that regionserver and movement of region 
to that server (movement rarely happens). The worst is we force major compact 
regions daily and that lead to the region data getting repeated over 7 times 
and In panic when we shutdown (gracefully) this server it lead to other 
regionservers that hosted these regions keep on compacting the whole next day 
(as each of them contained 5-7x the data of normal region). 

So then when applied this patch and hosted only two regions on this 
experimental regionserver for 2 days, and the samething repeated and when again 
we shutdown (again gracefully) the regionserver all the files did remain in the 
directory and it did lead to longer compaction next time.

If we can come up with patch after leak may I could take a stab testing again, 
I will also go through the close() to see if I am missing any thing.

Thanks




> Leaked References to StoreFile.Reader after HBASE-13082
> -------------------------------------------------------
>
>                 Key: HBASE-15101
>                 URL: https://issues.apache.org/jira/browse/HBASE-15101
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile, io
>    Affects Versions: 2.0.0
>            Reporter: deepankar
>            Assignee: deepankar
>         Attachments: HBASE-15101-v1.patch, HBASE-15101-v2.patch, 
> HBASE-15101-v3.patch, HBASE-15101.patch
>
>
> We observed this production that after a region server dies there are huge 
> number of hfiles in that region for the region server running the version 
> with HBASE-13082, In the doc it is given that it is expected to happen, but 
> we found a one place where scanners are not being closed. If the scanners are 
> not closed their references are not decremented and that is leading to the 
> issue of huge number of store files not being finalized
> All I was able to find is in the selectScannersFrom, where we discard some of 
> the scanners and we are not closing them. I am attaching a patch for that.
> Also to avoid these issues should the files that are done be logged and 
> finalized (moved to archive) as a part of region close operation. This will 
> solve any leaks that can happen and does not cause any dire consequences?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to