[ 
https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deepankar updated HBASE-16630:
------------------------------
    Attachment: HBASE-16630-v2.patch

Attached v2, this patch changes the logic of defragmentation to the one 
suggested [~vrodionov] and the one used by MemCached where they just choose 
some slabs and evict them completely the code turned out to be much cleaner and 
smaller. The only heuristic is that we avoid the buckets that are the only 
buckets for a BucketSizeInfo and the ones that have blocks that are currently 
in use and order them by their occupancy ratio.

Also added the change suggested [~zjushch] to add the memory type to the 
eviction. 

If this logic looks fine to the community I will go ahead and add a unit test 
to the freeEntireBlocks method

> Fragmentation in long running Bucket Cache
> ------------------------------------------
>
>                 Key: HBASE-16630
>                 URL: https://issues.apache.org/jira/browse/HBASE-16630
>             Project: HBase
>          Issue Type: Bug
>          Components: BucketCache
>    Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3
>            Reporter: deepankar
>            Assignee: deepankar
>         Attachments: HBASE-16630-v2.patch, HBASE-16630.patch
>
>
> As we are running bucket cache for a long time in our system, we are 
> observing cases where some nodes after some time does not fully utilize the 
> bucket cache, in some cases it is even worse in the sense they get stuck at a 
> value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables 
> are configured in-memory for simplicity sake).
> We took a heap dump and analyzed what is happening and saw that is classic 
> case of fragmentation, current implementation of BucketCache (mainly 
> BucketAllocator) relies on the logic that fullyFreeBuckets are available for 
> switching/adjusting cache usage between different bucketSizes . But once a 
> compaction / bulkload happens and the blocks are evicted from a bucket size , 
> these are usually evicted from random places of the buckets of a bucketSize 
> and thus locking the number of buckets associated with a bucketSize and in 
> the worst case of the fragmentation we have seen some bucketSizes with 
> occupancy ratio of <  10 % But they dont have any completelyFreeBuckets to 
> share with the other bucketSize. 
> Currently the existing eviction logic helps in the cases where cache used is 
> more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also 
> done, the eviction (freeSpace function) will not evict anything and the cache 
> utilization will be stuck at that value without any allocations for other 
> required sizes.
> The fix for this we came up with is simple that we do deFragmentation ( 
> compaction) of the bucketSize and thus increasing the occupancy ratio and 
> also freeing up the buckets to be fullyFree, this logic itself is not 
> complicated as the bucketAllocator takes care of packing the blocks in the 
> buckets, we need evict and re-allocate the blocks for all the BucketSizes 
> that dont fit the criteria.
> I am attaching an initial patch just to give an idea of what we are thinking 
> and I'll improve it based on the comments from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to