saintstack commented on a change in pull request #2665:
URL: https://github.com/apache/hbase/pull/2665#discussion_r524794058



##########
File path: src/main/asciidoc/_chapters/architecture.adoc
##########
@@ -2865,26 +2865,51 @@ The first mechanism is store file refresher which is 
introduced in HBase-1.0+. S
 
 For turning this feature on, you should configure 
`hbase.regionserver.storefile.refresh.period` to a non-zero value. See 
Configuration section below.
 
-==== Asnyc WAL replication
-The second mechanism for propagation of writes to secondaries is done via 
“Async WAL Replication” feature and is only available in HBase-1.1+. This works 
similarly to HBase’s multi-datacenter replication, but instead the data from a 
region is replicated to the secondary regions. Each secondary replica always 
receives and observes the writes in the same order that the primary region 
committed them. In some sense, this design can be thought of as “in-cluster 
replication”, where instead of replicating to a different datacenter, the data 
goes to secondary regions to keep secondary region’s in-memory state up to 
date. The data files are shared between the primary region and the other 
replicas, so that there is no extra storage overhead. However, the secondary 
regions will have recent non-flushed data in their memstores, which increases 
the memory overhead. The primary region writes flush, compaction, and bulk load 
events to its WAL as well, which are also replicated through w
 al replication to secondaries. When they observe the flush/compaction or bulk 
load event, the secondary regions replay the event to pick up the new files and 
drop the old ones.
+[[async.wal.replication]]
+==== Async WAL replication
+The second mechanism for propagation of writes to secondaries is done via the 
“Async WAL Replication” feature. It is only available in HBase-1.1+. This works 
similarly to HBase’s multi-datacenter replication, but instead the data from a 
region is replicated to the secondary regions. Each secondary replica always 
receives and observes the writes in the same order that the primary region 
committed them. In some sense, this design can be thought of as “in-cluster 
replication”, where instead of replicating to a different datacenter, the data 
goes to secondary regions to keep secondary region’s in-memory state up to 
date. The data files are shared between the primary region and the other 
replicas, so that there is no extra storage overhead. However, the secondary 
regions will have recent non-flushed data in their memstores, which increases 
the memory overhead. The primary region writes flush, compaction, and bulk load 
events to its WAL as well, which are also replicated throu
 gh wal replication to secondaries. When they observe the flush/compaction or 
bulk load event, the secondary regions replay the event to pick up the new 
files and drop the old ones.

Review comment:
       Added CRs for paragraphs I touched. Makes the patch bigger if that is ok.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to