[jira] [Commented] (HBASE-9131) Add admin-level documention about configuration and usage of the Bucket Cache

Jonathan Hsieh (JIRA) Tue, 06 Aug 2013 09:07:35 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730883#comment-13730883
 ]


Jonathan Hsieh commented on HBASE-9131:
---------------------------------------

[~zjushch] Thanks.  I think we are somewhere between too little detail and too 
much detail.

First, can we add the config variables to hbase-default.xml (with full 
descriptions and with units).

Now to the meat:

The patch doesn't tell the admin why or when they'd want to consider using 
this.  The link/pdf requires having to search for the bucket cache sections in 
the 2nd page and then goes on into too much design detail for an average admin. 
 (It also lacks the config variables  / instructions). 

My suggestion: Take let's take the high-level parts from section 3 of the pdf, 
polish it and add it to the official docs. 

Here's a stab at the sections that I think would be good for the ref guide with 
the prose improved a little bit: 

{quote}
*Design and Motivation* 

The Bucket Cache is an alternate block cache implementation that is designed to 
take advantage of large amounts of memory or low-latency storage.   (something 
about how big would be useful).   It is implemented as an off-the-jvm-heap and 
which has the secondary benefit of reducing JVM heap fragmentation that 
eventually causes stop-the-world JVM garbage collection operations. If one were 
to rely upon the standard JVM memory allocation and GC policies with large 
heaps (>16GB RAM) one would periodically incur instability in hbase due to long 
stop-the-world GC pauses (10's of secs to minutes) that can be misinterpreted 
as region server failures.

The storage of cached blocks is is not constrained to in RAM-only use; one 
could cache blocks in memory and also use a high speed disk, such as SSD's, 
Fusion-IO devices, or ram-disks as massive secondary cache.  (probably need 
something about the persistence properties not being required, but having the 
masssive capacity as a huge benefit.

Internally, the bucket cache divided storage into many *buckets*, each of which 
contains blocks of a particular range of sizes.  (this is a little fuzzy, needs 
some clarification).  Insertions and evictions of blocks backed by physical 
storage just overwrites blocks on the device or reads data from the storage 
device.  Managing these larger blocks prevents external fragmentation that 
causes GC pauses at the cost of some minor wasted space (internal 
fragmentation).

*Configuration and Usage*

To configure the bucket cache... (something along the line of what the current 
patch has)....

{quote}

Let me know what you think, and feel free to update/correct the draft.
                
> Add admin-level documention about configuration and usage of the Bucket Cache
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9131
>                 URL: https://issues.apache.org/jira/browse/HBASE-9131
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jonathan Hsieh
>         Attachments: hbase-9131.patch
>
>
> HBASE-7404 added the bucket cache but its configuration settings are 
> currently undocumented.  Without documentation developers would be the only 
> ones aware of the feature.
> Specifically documentation about slide 23 from 
> http://www.slideshare.net/cloudera/operations-session-4 would be great to add!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9131) Add admin-level documention about configuration and usage of the Bucket Cache

Reply via email to