saintstack commented on a change in pull request #2665:
URL: https://github.com/apache/hbase/pull/2665#discussion_r524794058
##########
File path: src/main/asciidoc/_chapters/architecture.adoc
##########
@@ -2865,26 +2865,51 @@ The first mechanism is store file refresher which is
introduced in HBase-1.0+. S
For turning this feature on, you should configure
`hbase.regionserver.storefile.refresh.period` to a non-zero value. See
Configuration section below.
-==== Asnyc WAL replication
-The second mechanism for propagation of writes to secondaries is done via
“Async WAL Replication” feature and is only available in HBase-1.1+. This works
similarly to HBase’s multi-datacenter replication, but instead the data from a
region is replicated to the secondary regions. Each secondary replica always
receives and observes the writes in the same order that the primary region
committed them. In some sense, this design can be thought of as “in-cluster
replication”, where instead of replicating to a different datacenter, the data
goes to secondary regions to keep secondary region’s in-memory state up to
date. The data files are shared between the primary region and the other
replicas, so that there is no extra storage overhead. However, the secondary
regions will have recent non-flushed data in their memstores, which increases
the memory overhead. The primary region writes flush, compaction, and bulk load
events to its WAL as well, which are also replicated through w
al replication to secondaries. When they observe the flush/compaction or bulk
load event, the secondary regions replay the event to pick up the new files and
drop the old ones.
+[[async.wal.replication]]
+==== Async WAL replication
+The second mechanism for propagation of writes to secondaries is done via the
“Async WAL Replication” feature. It is only available in HBase-1.1+. This works
similarly to HBase’s multi-datacenter replication, but instead the data from a
region is replicated to the secondary regions. Each secondary replica always
receives and observes the writes in the same order that the primary region
committed them. In some sense, this design can be thought of as “in-cluster
replication”, where instead of replicating to a different datacenter, the data
goes to secondary regions to keep secondary region’s in-memory state up to
date. The data files are shared between the primary region and the other
replicas, so that there is no extra storage overhead. However, the secondary
regions will have recent non-flushed data in their memstores, which increases
the memory overhead. The primary region writes flush, compaction, and bulk load
events to its WAL as well, which are also replicated throu
gh wal replication to secondaries. When they observe the flush/compaction or
bulk load event, the secondary regions replay the event to pick up the new
files and drop the old ones.
Review comment:
Added CRs for paragraphs I touched. Makes the patch bigger if that is ok.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]