[hbase] branch master updated: HBASE-25816: Improve the documentation of Architecture section of reference guide (#3211)

sakthi Fri, 30 Apr 2021 13:42:36 -0700

This is an automated email from the ASF dual-hosted git repository.

sakthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hbase.git



The following commit(s) were added to refs/heads/master by this push:
     new 5d42f58  HBASE-25816: Improve the documentation of Architecture 
section of reference guide (#3211)
5d42f58 is described below

commit 5d42f58ff604497b083e8e2dae0347f1fb3618fa
Author: Kota-SH <[email protected]>
AuthorDate: Fri Apr 30 13:42:06 2021 -0700

    HBASE-25816: Improve the documentation of Architecture section of reference 
guide (#3211)
    
    Signed-off-by: Sakthi <[email protected]>
---
 src/main/asciidoc/_chapters/architecture.adoc | 28 +++++++++++++--------------
 src/main/asciidoc/_chapters/hbase_mob.adoc    | 10 +++++-----
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 0b12d29..5e27459 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -293,7 +293,7 @@ HMasters instead of ZooKeeper ensemble`
 
 To reduce hot-spotting on a single master, all the masters (active & stand-by) 
expose the needed
 service to fetch the connection metadata. This lets the client connect to any 
master (not just active).
-Both ZooKeeper- and Master-based connection registry implementations are 
available in 2.3+. For
+Both ZooKeeper-based and Master-based connection registry implementations are 
available in 2.3+. For
 2.3 and earlier, the ZooKeeper-based implementation remains the default 
configuration.
 The Master-based implementation becomes the default in 3.0.0.
 
@@ -437,7 +437,7 @@ ValueFilter vf = new ValueFilter(CompareOperator.EQUAL,
 scan.setFilter(vf);
 ...
 ----
-This scan will restrict to the specified column 'family:qualifier', avoiding 
scan unrelated
+This scan will restrict to the specified column 'family:qualifier', avoiding 
scan of unrelated
 families and columns, which has better performance, and `ValueFilter` is the 
condition used to do
 the value filtering.
 
@@ -664,7 +664,7 @@ If the active Master loses its lease in ZooKeeper (or the 
Master shuts down), th
 [[master.runtime]]
 === Runtime Impact
 
-A common dist-list question involves what happens to an HBase cluster when the 
Master goes down. This information has changed staring 3.0.0.
+A common dist-list question involves what happens to an HBase cluster when the 
Master goes down. This information has changed starting 3.0.0.
 
 ==== Up until releases 2.x.y
 Because the HBase client talks directly to the RegionServers, the cluster can 
still function in a "steady state". Additionally, per <<arch.catalog>>, 
`hbase:meta` exists as an HBase table and is not resident in the Master.
@@ -719,7 +719,7 @@ _MasterProcWAL is replaced in hbase-2.3.0 by an alternate 
Procedure Store implem
 HMaster records administrative operations and their running states, such as 
the handling of a crashed server,
 table creation, and other DDLs, into a Procedure Store. The Procedure Store 
WALs are stored under the
 MasterProcWALs directory. The Master WALs are not like RegionServer WALs. 
Keeping up the Master WAL allows
-us run a state machine that is resilient across Master failures. For example, 
if a HMaster was in the
+us to run a state machine that is resilient across Master failures. For 
example, if a HMaster was in the
 middle of creating a table encounters an issue and fails, the next active 
HMaster can take up where
 the previous left off and carry the operation to completion. Since 
hbase-2.0.0, a
 new AssignmentManager (A.K.A AMv2) was introduced and the HMaster handles 
region assignment
@@ -920,7 +920,7 @@ The reason it is included in this equation is that it would 
be unrealistic to sa
 Here are some examples:
 
 * One region server with the heap size set to 1 GB and the default block cache 
size will have 405 MB of block cache available.
-* 20 region servers with the heap size set to 8 GB and a default block cache 
size will have 63.3 of block cache.
+* 20 region servers with the heap size set to 8 GB and a default block cache 
size will have 63.3 GB of block cache.
 * 100 region servers with the heap size set to 24 GB and a block cache size of 
0.5 will have about 1.16 TB of block cache.
 
 Your data is not the only resident of the block cache.
@@ -933,7 +933,7 @@ NOTE: The hbase:meta tables can occupy a few MBs depending 
on the number of regi
 
 HFiles Indexes::
   An _HFile_ is the file format that HBase uses to store data in HDFS.
-  It contains a multi-layered index which allows HBase to seek to the data 
without having to read the whole file.
+  It contains a multi-layered index which allows HBase to seek the data 
without having to read the whole file.
   The size of those indexes is a factor of the block size (64KB by default), 
the size of your keys and the amount of data you are storing.
   For big data sets it's not unusual to see numbers around 1GB per region 
server, although not all of it will be in cache because the LRU will evict 
indexes that aren't used.
 
@@ -974,7 +974,7 @@ Since 
link:https://issues.apache.org/jira/browse/HBASE-4683[HBASE-4683 Always ca
 [[enable.bucketcache]]
 ===== How to Enable BucketCache
 
-The usual deploy of BucketCache is via a managing class that sets up two 
caching tiers:
+The usual deployment of BucketCache is via a managing class that sets up two 
caching tiers:
 an on-heap cache implemented by LruBlockCache and a second  cache implemented 
with BucketCache.
 The managing class is 
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache]
 by default.
 The previous link describes the caching 'policy' implemented by 
CombinedBlockCache.
@@ -1005,7 +1005,7 @@ HBASE-11425 changed the HBase read path so it could hold 
the read-data off-heap
 See <<regionserver.offheap.readpath>>. In hbase-2.0.0, off-heap latencies 
approach those of on-heap cache latencies with the added
 benefit of NOT provoking GC.
 +
-From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When 
BucketCache is turned on, the DATA blocks will always go to BucketCache and 
INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support hase 
been removed.
+From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When 
BucketCache is turned on, the DATA blocks will always go to BucketCache and 
INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support has 
been removed.
 ====
 
 [[bc.deloy.modes]]
@@ -1013,7 +1013,7 @@ From HBase 2.0.0 onwards, the notions of L1 and L2 have 
been deprecated. When Bu
 The BucketCache Block Cache can be deployed _offheap_, _file_ or _mmaped_ file 
mode.
 
 You set which via the `hbase.bucketcache.ioengine` setting.
-Setting it to `offheap` will have BucketCache make its allocations off-heap, 
and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use 
file caching (Useful in particular if you have some fast I/O attached to the 
box such as SSDs). From 2.0.0, it is possible to have more than one file 
backing the BucketCache. This is very useful specially when the Cache size 
requirement is high. For multiple backing files, configure ioengine as 
`files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_T [...]
+Setting it to `offheap` will have BucketCache make its allocations off-heap, 
and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use 
file caching (Useful in particular if you have some fast I/O attached to the 
box such as SSDs). From 2.0.0, it is possible to have more than one file 
backing the BucketCache. This is very useful especially when the Cache size 
requirement is high. For multiple backing files, configure ioengine as 
`files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_ [...]
 
 It is possible to deploy a tiered setup where we bypass the CombinedBlockCache 
policy and have BucketCache working as a strict L2 cache to the L1 
LruBlockCache.
 For such a setup, set `hbase.bucketcache.combinedcache.enabled` to `false`.
@@ -1133,7 +1133,7 @@ As write requests are handled by the region server, they 
accumulate in an in-mem
 
 Logically, the process of splitting a region is simple. We find a suitable 
point in the keyspace of the region where we should divide the region in half, 
then split the region's data into two new regions at that point. The details of 
the process however are not simple.  When a split happens, the newly created 
_daughter regions_ do not rewrite all the data into new files immediately. 
Instead, they create small files similar to symbolic link files, named 
link:https://hbase.apache.org/devap [...]
 
-Although splitting the region is a local decision made by the RegionServer, 
the split process itself must coordinate with many actors. The RegionServer 
notifies the Master before and after the split, updates the `.META.` table so 
that clients can discover the new daughter regions, and rearranges the 
directory structure and data files in HDFS. Splitting is a multi-task process. 
To enable rollback in case of an error, the RegionServer keeps an in-memory 
journal about the execution state. T [...]
+Although splitting the region is a local decision made by the RegionServer, 
the split process itself must coordinate with many actors. The RegionServer 
notifies the Master before and after the split, updates the `.META.` table so 
that clients can discover the new daughter regions, and rearranges the 
directory structure and data files in HDFS. Splitting is a multi-task process. 
To enable rollback in case of an error, the RegionServer keeps an in-memory 
journal about the execution state. T [...]
 
 [[regionserver_split_process_image]]
 .RegionServer Split Process
@@ -1188,14 +1188,14 @@ 
link:http://en.wikipedia.org/wiki/Write-ahead_logging[Write-Ahead Log] article.
 
 [[wal.providers]]
 ==== WAL Providers
-In HBase, there are a number of WAL imlementations (or 'Providers'). Each is 
known
+In HBase, there are a number of WAL implementations (or 'Providers'). Each is 
known
 by a short name label (that unfortunately is not always descriptive). You set 
the provider in
-_hbase-site.xml_ passing the WAL provder short-name as the value on the
+_hbase-site.xml_ passing the WAL provider short-name as the value on the
 _hbase.wal.provider_ property (Set the provider for _hbase:meta_ using the
 _hbase.wal.meta_provider_ property, otherwise it uses the same provider 
configured
 by _hbase.wal.provider_).
 
- * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). 
This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is 
built on a new non-blocking dfsclient implementation. It is currently resident 
in the hbase codebase but intent is to move it back up into HDFS itself. WALs 
edits are written concurrently ("fan-out") style to each of the WAL-block 
replicas on each DataNode rather than in a chained pipeline as the default 
client does. Latencies should  [...]
+ * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). 
This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is 
built on a new non-blocking dfsclient implementation. It is currently resident 
in the hbase codebase but intent is to move it back up into HDFS itself. WALs 
edits are written concurrently ("fan-out") style to each of the WAL-block 
replicas on each DataNode rather than in a chained pipeline as the default 
client does. Latencies should  [...]
  * _filesystem_: This was the default in hbase-1.x releases. It is built on 
the blocking _DFSClient_ and writes to replicas in classic _DFSCLient_ pipeline 
mode. In logs it identifies as _FSHLog_ or _FSHLogProvider_.
  * _multiwal_: This provider is made of multiple instances of _asyncfs_ or  
_filesystem_. See the next section for more on _multiwal_.
 
@@ -1371,7 +1371,7 @@ The default value for this property is `false`.
 By default, WAL tag compression is turned on when WAL compression is enabled.
 You can turn off WAL tag compression by setting the 
`hbase.regionserver.wal.tags.enablecompression` property to 'false'.
 
-A possible downside to WAL compression is that we lose more data from the last 
block in the WAL if it ill-terminated
+A possible downside to WAL compression is that we lose more data from the last 
block in the WAL if it is ill-terminated
 mid-write. If entries in this last block were added with new dictionary 
entries but we failed persist the amended
 dictionary because of an abrupt termination, a read of this last block may not 
be able to resolve last-written entries.
 
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc 
b/src/main/asciidoc/_chapters/hbase_mob.adoc
index 9b67c6e..0e09db1 100644
--- a/src/main/asciidoc/_chapters/hbase_mob.adoc
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -179,10 +179,10 @@ space. The only way to stop using the space of a 
particular MOB hfile is to ensu
 hold references to it. To do that we need to ensure we have written the 
current values into a new
 MOB hfile. If our backing filesystem has a limitation on the number of files 
that can be present, as
 HDFS does, then even if we do not have deletes or updates of MOB cells 
eventually there will be a
-sufficient number of MOB hfiles that we will need to coallesce them.
+sufficient number of MOB hfiles that we will need to coalesce them.
 
 Periodically a chore in the master coordinates having the region servers
-perform a special major compaction that also handles rewritting new MOB files. 
Like all compactions
+perform a special major compaction that also handles rewriting new MOB files. 
Like all compactions
 the Region Server will create updated hfiles that hold both the cells that are 
smaller than the MOB
 threshold and cells that hold references to the newly rewritten MOB file. 
Because this rewriting has
 the advantage of looking across all active cells for the region our several 
small MOB files should
@@ -237,7 +237,7 @@ To determine if a MOB HFile meets the second criteria the 
chore extracts metadat
 HFiles for each MOB enabled column family for a given table. That metadata 
enumerates the complete
 set of MOB HFiles needed to satisfy the references stored in the normal HFile 
area.
 
-The period of the cleaner chore can be configued by setting 
`hbase.master.mob.cleaner.period` to a
+The period of the cleaner chore can be configured by setting 
`hbase.master.mob.cleaner.period` to a
 positive integer number of seconds. It defaults to running daily. You should 
not need to tune it
 unless you have a very aggressive TTL or a very high rate of MOB updates with 
a correspondingly
 high rate of non-MOB compactions.
@@ -247,7 +247,7 @@ high rate of non-MOB compactions.
 ==== Further limiting write amplification
 
 If your MOB workload has few to no updates or deletes then you can opt-in to 
MOB compactions that
-optimize for limiting the amount of write amplification. It acheives this by 
setting a
+optimize for limiting the amount of write amplification. It achieves this by 
setting a
 size threshold to ignore MOB files during the compaction process. When a given 
region goes
 through MOB compaction it will evaluate the size of the MOB file that 
currently holds the actual
 value and skip rewriting the value if that file is over threshold.
@@ -629,7 +629,7 @@ HBase upgrades.
 
 Prior to the work in HBASE-22749, "Distributed MOB compactions", HBase had the 
Master coordinate all
 compaction maintenance of the MOB hfiles. Centralizing management of the MOB 
data allowed for space
-optimizations but safely coordinating that managemet with Region Servers 
resulted in edge cases that
+optimizations but safely coordinating that management with Region Servers 
resulted in edge cases that
 caused data loss (ref 
link:https://issues.apache.org/jira/browse/HBASE-22075[HBASE-22075]).
 
 Users of the MOB feature upgrading to a version of HBase that includes 
HBASE-22749 should be aware

[hbase] branch master updated: HBASE-25816: Improve the documentation of Architecture section of reference guide (#3211)

Reply via email to