This is an automated email from the ASF dual-hosted git repository.
sakthi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hbase.git
The following commit(s) were added to refs/heads/master by this push:
new 5d42f58 HBASE-25816: Improve the documentation of Architecture
section of reference guide (#3211)
5d42f58 is described below
commit 5d42f58ff604497b083e8e2dae0347f1fb3618fa
Author: Kota-SH <[email protected]>
AuthorDate: Fri Apr 30 13:42:06 2021 -0700
HBASE-25816: Improve the documentation of Architecture section of reference
guide (#3211)
Signed-off-by: Sakthi <[email protected]>
---
src/main/asciidoc/_chapters/architecture.adoc | 28 +++++++++++++--------------
src/main/asciidoc/_chapters/hbase_mob.adoc | 10 +++++-----
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/src/main/asciidoc/_chapters/architecture.adoc
b/src/main/asciidoc/_chapters/architecture.adoc
index 0b12d29..5e27459 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -293,7 +293,7 @@ HMasters instead of ZooKeeper ensemble`
To reduce hot-spotting on a single master, all the masters (active & stand-by)
expose the needed
service to fetch the connection metadata. This lets the client connect to any
master (not just active).
-Both ZooKeeper- and Master-based connection registry implementations are
available in 2.3+. For
+Both ZooKeeper-based and Master-based connection registry implementations are
available in 2.3+. For
2.3 and earlier, the ZooKeeper-based implementation remains the default
configuration.
The Master-based implementation becomes the default in 3.0.0.
@@ -437,7 +437,7 @@ ValueFilter vf = new ValueFilter(CompareOperator.EQUAL,
scan.setFilter(vf);
...
----
-This scan will restrict to the specified column 'family:qualifier', avoiding
scan unrelated
+This scan will restrict to the specified column 'family:qualifier', avoiding
scan of unrelated
families and columns, which has better performance, and `ValueFilter` is the
condition used to do
the value filtering.
@@ -664,7 +664,7 @@ If the active Master loses its lease in ZooKeeper (or the
Master shuts down), th
[[master.runtime]]
=== Runtime Impact
-A common dist-list question involves what happens to an HBase cluster when the
Master goes down. This information has changed staring 3.0.0.
+A common dist-list question involves what happens to an HBase cluster when the
Master goes down. This information has changed starting 3.0.0.
==== Up until releases 2.x.y
Because the HBase client talks directly to the RegionServers, the cluster can
still function in a "steady state". Additionally, per <<arch.catalog>>,
`hbase:meta` exists as an HBase table and is not resident in the Master.
@@ -719,7 +719,7 @@ _MasterProcWAL is replaced in hbase-2.3.0 by an alternate
Procedure Store implem
HMaster records administrative operations and their running states, such as
the handling of a crashed server,
table creation, and other DDLs, into a Procedure Store. The Procedure Store
WALs are stored under the
MasterProcWALs directory. The Master WALs are not like RegionServer WALs.
Keeping up the Master WAL allows
-us run a state machine that is resilient across Master failures. For example,
if a HMaster was in the
+us to run a state machine that is resilient across Master failures. For
example, if a HMaster was in the
middle of creating a table encounters an issue and fails, the next active
HMaster can take up where
the previous left off and carry the operation to completion. Since
hbase-2.0.0, a
new AssignmentManager (A.K.A AMv2) was introduced and the HMaster handles
region assignment
@@ -920,7 +920,7 @@ The reason it is included in this equation is that it would
be unrealistic to sa
Here are some examples:
* One region server with the heap size set to 1 GB and the default block cache
size will have 405 MB of block cache available.
-* 20 region servers with the heap size set to 8 GB and a default block cache
size will have 63.3 of block cache.
+* 20 region servers with the heap size set to 8 GB and a default block cache
size will have 63.3 GB of block cache.
* 100 region servers with the heap size set to 24 GB and a block cache size of
0.5 will have about 1.16 TB of block cache.
Your data is not the only resident of the block cache.
@@ -933,7 +933,7 @@ NOTE: The hbase:meta tables can occupy a few MBs depending
on the number of regi
HFiles Indexes::
An _HFile_ is the file format that HBase uses to store data in HDFS.
- It contains a multi-layered index which allows HBase to seek to the data
without having to read the whole file.
+ It contains a multi-layered index which allows HBase to seek the data
without having to read the whole file.
The size of those indexes is a factor of the block size (64KB by default),
the size of your keys and the amount of data you are storing.
For big data sets it's not unusual to see numbers around 1GB per region
server, although not all of it will be in cache because the LRU will evict
indexes that aren't used.
@@ -974,7 +974,7 @@ Since
link:https://issues.apache.org/jira/browse/HBASE-4683[HBASE-4683 Always ca
[[enable.bucketcache]]
===== How to Enable BucketCache
-The usual deploy of BucketCache is via a managing class that sets up two
caching tiers:
+The usual deployment of BucketCache is via a managing class that sets up two
caching tiers:
an on-heap cache implemented by LruBlockCache and a second cache implemented
with BucketCache.
The managing class is
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache]
by default.
The previous link describes the caching 'policy' implemented by
CombinedBlockCache.
@@ -1005,7 +1005,7 @@ HBASE-11425 changed the HBase read path so it could hold
the read-data off-heap
See <<regionserver.offheap.readpath>>. In hbase-2.0.0, off-heap latencies
approach those of on-heap cache latencies with the added
benefit of NOT provoking GC.
+
-From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When
BucketCache is turned on, the DATA blocks will always go to BucketCache and
INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support hase
been removed.
+From HBase 2.0.0 onwards, the notions of L1 and L2 have been deprecated. When
BucketCache is turned on, the DATA blocks will always go to BucketCache and
INDEX/BLOOM blocks go to on heap LRUBlockCache. `cacheDataInL1` support has
been removed.
====
[[bc.deloy.modes]]
@@ -1013,7 +1013,7 @@ From HBase 2.0.0 onwards, the notions of L1 and L2 have
been deprecated. When Bu
The BucketCache Block Cache can be deployed _offheap_, _file_ or _mmaped_ file
mode.
You set which via the `hbase.bucketcache.ioengine` setting.
-Setting it to `offheap` will have BucketCache make its allocations off-heap,
and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use
file caching (Useful in particular if you have some fast I/O attached to the
box such as SSDs). From 2.0.0, it is possible to have more than one file
backing the BucketCache. This is very useful specially when the Cache size
requirement is high. For multiple backing files, configure ioengine as
`files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_T [...]
+Setting it to `offheap` will have BucketCache make its allocations off-heap,
and an ioengine setting of `file:PATH_TO_FILE` will direct BucketCache to use
file caching (Useful in particular if you have some fast I/O attached to the
box such as SSDs). From 2.0.0, it is possible to have more than one file
backing the BucketCache. This is very useful especially when the Cache size
requirement is high. For multiple backing files, configure ioengine as
`files:PATH_TO_FILE1,PATH_TO_FILE2,PATH_ [...]
It is possible to deploy a tiered setup where we bypass the CombinedBlockCache
policy and have BucketCache working as a strict L2 cache to the L1
LruBlockCache.
For such a setup, set `hbase.bucketcache.combinedcache.enabled` to `false`.
@@ -1133,7 +1133,7 @@ As write requests are handled by the region server, they
accumulate in an in-mem
Logically, the process of splitting a region is simple. We find a suitable
point in the keyspace of the region where we should divide the region in half,
then split the region's data into two new regions at that point. The details of
the process however are not simple. When a split happens, the newly created
_daughter regions_ do not rewrite all the data into new files immediately.
Instead, they create small files similar to symbolic link files, named
link:https://hbase.apache.org/devap [...]
-Although splitting the region is a local decision made by the RegionServer,
the split process itself must coordinate with many actors. The RegionServer
notifies the Master before and after the split, updates the `.META.` table so
that clients can discover the new daughter regions, and rearranges the
directory structure and data files in HDFS. Splitting is a multi-task process.
To enable rollback in case of an error, the RegionServer keeps an in-memory
journal about the execution state. T [...]
+Although splitting the region is a local decision made by the RegionServer,
the split process itself must coordinate with many actors. The RegionServer
notifies the Master before and after the split, updates the `.META.` table so
that clients can discover the new daughter regions, and rearranges the
directory structure and data files in HDFS. Splitting is a multi-task process.
To enable rollback in case of an error, the RegionServer keeps an in-memory
journal about the execution state. T [...]
[[regionserver_split_process_image]]
.RegionServer Split Process
@@ -1188,14 +1188,14 @@
link:http://en.wikipedia.org/wiki/Write-ahead_logging[Write-Ahead Log] article.
[[wal.providers]]
==== WAL Providers
-In HBase, there are a number of WAL imlementations (or 'Providers'). Each is
known
+In HBase, there are a number of WAL implementations (or 'Providers'). Each is
known
by a short name label (that unfortunately is not always descriptive). You set
the provider in
-_hbase-site.xml_ passing the WAL provder short-name as the value on the
+_hbase-site.xml_ passing the WAL provider short-name as the value on the
_hbase.wal.provider_ property (Set the provider for _hbase:meta_ using the
_hbase.wal.meta_provider_ property, otherwise it uses the same provider
configured
by _hbase.wal.provider_).
- * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790).
This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is
built on a new non-blocking dfsclient implementation. It is currently resident
in the hbase codebase but intent is to move it back up into HDFS itself. WALs
edits are written concurrently ("fan-out") style to each of the WAL-block
replicas on each DataNode rather than in a chained pipeline as the default
client does. Latencies should [...]
+ * _asyncfs_: The *default*. New since hbase-2.0.0 (HBASE-15536, HBASE-14790).
This _AsyncFSWAL_ provider, as it identifies itself in RegionServer logs, is
built on a new non-blocking dfsclient implementation. It is currently resident
in the hbase codebase but intent is to move it back up into HDFS itself. WALs
edits are written concurrently ("fan-out") style to each of the WAL-block
replicas on each DataNode rather than in a chained pipeline as the default
client does. Latencies should [...]
* _filesystem_: This was the default in hbase-1.x releases. It is built on
the blocking _DFSClient_ and writes to replicas in classic _DFSCLient_ pipeline
mode. In logs it identifies as _FSHLog_ or _FSHLogProvider_.
* _multiwal_: This provider is made of multiple instances of _asyncfs_ or
_filesystem_. See the next section for more on _multiwal_.
@@ -1371,7 +1371,7 @@ The default value for this property is `false`.
By default, WAL tag compression is turned on when WAL compression is enabled.
You can turn off WAL tag compression by setting the
`hbase.regionserver.wal.tags.enablecompression` property to 'false'.
-A possible downside to WAL compression is that we lose more data from the last
block in the WAL if it ill-terminated
+A possible downside to WAL compression is that we lose more data from the last
block in the WAL if it is ill-terminated
mid-write. If entries in this last block were added with new dictionary
entries but we failed persist the amended
dictionary because of an abrupt termination, a read of this last block may not
be able to resolve last-written entries.
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc
b/src/main/asciidoc/_chapters/hbase_mob.adoc
index 9b67c6e..0e09db1 100644
--- a/src/main/asciidoc/_chapters/hbase_mob.adoc
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -179,10 +179,10 @@ space. The only way to stop using the space of a
particular MOB hfile is to ensu
hold references to it. To do that we need to ensure we have written the
current values into a new
MOB hfile. If our backing filesystem has a limitation on the number of files
that can be present, as
HDFS does, then even if we do not have deletes or updates of MOB cells
eventually there will be a
-sufficient number of MOB hfiles that we will need to coallesce them.
+sufficient number of MOB hfiles that we will need to coalesce them.
Periodically a chore in the master coordinates having the region servers
-perform a special major compaction that also handles rewritting new MOB files.
Like all compactions
+perform a special major compaction that also handles rewriting new MOB files.
Like all compactions
the Region Server will create updated hfiles that hold both the cells that are
smaller than the MOB
threshold and cells that hold references to the newly rewritten MOB file.
Because this rewriting has
the advantage of looking across all active cells for the region our several
small MOB files should
@@ -237,7 +237,7 @@ To determine if a MOB HFile meets the second criteria the
chore extracts metadat
HFiles for each MOB enabled column family for a given table. That metadata
enumerates the complete
set of MOB HFiles needed to satisfy the references stored in the normal HFile
area.
-The period of the cleaner chore can be configued by setting
`hbase.master.mob.cleaner.period` to a
+The period of the cleaner chore can be configured by setting
`hbase.master.mob.cleaner.period` to a
positive integer number of seconds. It defaults to running daily. You should
not need to tune it
unless you have a very aggressive TTL or a very high rate of MOB updates with
a correspondingly
high rate of non-MOB compactions.
@@ -247,7 +247,7 @@ high rate of non-MOB compactions.
==== Further limiting write amplification
If your MOB workload has few to no updates or deletes then you can opt-in to
MOB compactions that
-optimize for limiting the amount of write amplification. It acheives this by
setting a
+optimize for limiting the amount of write amplification. It achieves this by
setting a
size threshold to ignore MOB files during the compaction process. When a given
region goes
through MOB compaction it will evaluate the size of the MOB file that
currently holds the actual
value and skip rewriting the value if that file is over threshold.
@@ -629,7 +629,7 @@ HBase upgrades.
Prior to the work in HBASE-22749, "Distributed MOB compactions", HBase had the
Master coordinate all
compaction maintenance of the MOB hfiles. Centralizing management of the MOB
data allowed for space
-optimizations but safely coordinating that managemet with Region Servers
resulted in edge cases that
+optimizations but safely coordinating that management with Region Servers
resulted in edge cases that
caused data loss (ref
link:https://issues.apache.org/jira/browse/HBASE-22075[HBASE-22075]).
Users of the MOB feature upgrading to a version of HBase that includes
HBASE-22749 should be aware