[jira] [Resolved] (HBASE-16630) Fragmentation in long running Bucket Cache

ramkrishna.s.vasudevan (JIRA) Fri, 03 Mar 2017 20:39:07 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramkrishna.s.vasudevan resolved HBASE-16630.
--------------------------------------------
    Resolution: Fixed

Pushed to all branches including branch-1.2. There was an env issue in my 
branch-1.2. I was able to correct that and committed this patch. 
Thanks for all the reviews and for your persistence with this patch [~dvdreddy].

> Fragmentation in long running Bucket Cache
> ------------------------------------------
>
>                 Key: HBASE-16630
>                 URL: https://issues.apache.org/jira/browse/HBASE-16630
>             Project: HBase
>          Issue Type: Bug
>          Components: BucketCache
>    Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>            Reporter: deepankar
>            Assignee: deepankar
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.6
>
>         Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, 
> HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3-branch-1.patch, 
> HBASE-16630-v3-branch-1.X.patch, HBASE-16630-v3.patch, 
> HBASE-16630-v4-branch-1.X.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-16630) Fragmentation in long running Bucket Cache

Reply via email to