[ 
https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064352#comment-14064352
 ] 

stack commented on HBASE-8894:
------------------------------

Went through this patch and JIRA.  Reviewing to ensure we've picked-over all 
the good stuff contained herein.

The bulk of this patch is the bucket cache patch HBASE-7404 with some 
difference in formatting/style, cleanup and some extra tests.  Other 
differences are features most of which we now have in trunk (will doc those we 
don't below) and a hard-coded behavior policy for L1 and L2 tiers (the 
eviction/cache behavior between tiers).

If we were to bring this in, it'd be a new caching implementation altogether.  
At a minimum, we'd need to refactor to remove dup of base BucketCache classes, 
classes we already have in trunk.  Besides, in trunk, the tendency is to 
encapsulate caching policy in a BlockCache implementation (e.g. 
CombinedBlockCache) so to keep in mode w/ trunk, we'd refactor this code to 
have a policy that implemented this particular L1 to L2 relationship.  A 
caching policy would also manage how cache-on-write or cache-on-read works.  
This patch pollutes base classes HFileReaderV2, etc., with the details of this 
particular caching policy's cache-on-write/cache-on-read details which would 
have to be redone too.

But I think we have the bulk of what this patch does in trunk now caveat a few 
diffs listed below.

bq. ..compressed and encoded second-level cache.

Inspired by this patch, HBASE-11331 adds this.  It is yet to be committed. A 
report shows compression makes for much less throughput, more CPU (more GC), 
with slightly less i/o on a smallish heap (TODO: test on larger heap).

bq. "Only handle byte arrays (i.e., no more serialization/deserialization 
within)"

The bucketcache serialization in trunk does ByteBuffer reuse/slicing when 
reading (new object, no 'serializing').  On or offheap cache, some allocation 
of HFileBlock header info is done. So some 'serialization' cost but not too 
much if I am reading correctly.  I may have missed something here?

bq.  Remove persistence support for the time being

We kept this support in trunk when we brought over HBASE-7404

bq. Keep an index of hfilename to blocks for efficient eviction on close

Trunk has this. It does not allow different config for L1 and L2; it just 
evicts blocks from all caches when a file closes by default ('Cache policy' can 
change this behavior).

This patch adds cache-on-read but it has a particular semantic in that it is 
whether to cache-on-read the block into L1 when fetched from L2. In trunk we  
just 'cache-on-read' always (letting the cache 'policy' decide where the block 
should go).

The builder for CacheConfig in this patch is sweet but will pass on it for now 
since it would ripple through code base.  May be back for it later.

bq. Flush: 1. Write the already compressed and serialized data to L2 cache 
along with disk.

If I am reading correctly, we do this in trunk (To be confirmed).

bq. ...there's ability to keep very hot blocks (both data and meta blocks) in 
the regular BlockCache, which becomes the L1 cache. 

This has been added to master.  You have to enable it but it is possible. An 
improvement would be HBASE-11331 moving blocks to L1 if hot.

bq. Sending blocks evicted from L1 directly to L2
bq. Evicting blocks from the L2 cache upon promotion to the L1 cache

Still needs to be done in this patch.

There is an option for setting up L1+L2 in trunk such that blocks evicted from 
L1 go to L2  -- 'victimHandler' in LruBlockCache (The second item is not done 
in trunk nor in this patch if I am reading correctly)

bq. Issues evicting/draining...

Needs work still in this patch (and in master)

This patch adds clearCache to BlockCache for testing.  I added this to trunk.  
Useful.

We have getOffset in BockCacheKey now in trunk.

We have the cacheconfig changes minus L2_ prefix in master.

Pulls in ByteBufferArray from hadoop.  We don't need to do that.

This patch adds trace logging around bucketcache main ops.  We could do with 
that.

I tried to bring over some of the tests but differences in config names and our 
missing some of the features made it tough, e.g. TestL2BucketCache, so punted.

Here are the lists of TODOs as I see it:

* Cache blocks to L2 on flush?  Check we do this in master.  We are missing 
this config in trunk (it is in this patch).
* Adds explicity L2 metrics.  Currently our metrics are combined.  I think this 
is pretty critical. HBASE-11532
* Add trace logging around eviction and caching to watch better the cache in 
action.


> Forward port compressed l2 cache from 0.89fb
> --------------------------------------------
>
>                 Key: HBASE-8894
>                 URL: https://issues.apache.org/jira/browse/HBASE-8894
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Critical
>         Attachments: HBASE-8894-0.94-v1.txt, HBASE-8894-0.94-v2.txt
>
>
> Forward port Alex's improvement on hbase-7407 from 0.89-fb branch:
> {code}
>   1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines
>   2
>   3 [master] Implements a secondary compressed cache (L2 cache)
>   4
>   5 Author: avf
>   6
>   7 Summary:
>   8 This revision implements compressed and encoded second-level cache with 
> off-heap
>   9 (and optionally on-heap) storage and a bucket-allocator based on 
> HBASE-7404.
>  10
>  11 BucketCache from HBASE-7404 is extensively modified to:
>  12
>  13 * Only handle byte arrays (i.e., no more serialization/deserialization 
> within)
>  14 * Remove persistence support for the time being
>  15 * Keep an  index of hfilename to blocks for efficient eviction on close
>  16
>  17 A new interface (L2Cache) is introduced in order to separate it from the 
> current
>  18 implementation. The L2 cache is then integrated into the classes that 
> handle
>  19 reading from and writing to HFiles to allow cache-on-write as well as
>  20 cache-on-read. Metrics for the L2 cache are integrated into 
> RegionServerMetrics
>  21 much in the same fashion as metrics for the existing (L2) BlockCache.
>  22
>  23 Additionally, CacheConfig class is re-refactored to configure the L2 
> cache,
>  24 replace multile constructors with a Builder, as well as replace static 
> methods
>  25 for instantiating the caches with abstract factories (with singleton
>  26 implementations for both the existing LruBlockCache and the newly 
> introduced
>  27 BucketCache based L2 cache)
>  28
>  29 Test Plan:
>  30 1) Additional unit tests
>  31 2) Stress test on a single devserver
>  32 3) Test on a single-node in shadow cluster
>  33 4) Test on a whole shadow cluster
>  34
>  35 Revert Plan:
>  36
>  37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela
>  38
>  39 Reviewed By: liyintang
>  40
>  41 CC: gqchen, hbase-eng@
>  42
>  43 Differential Revision: https://phabricator.fb.com/D837264
>  44
>  45 Task ID: 2325295
>  7 ------------------------------------------------------------------------
>   6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to