[2/2] hbase git commit: updating docs from master

ndimiduk Sat, 12 Aug 2017 11:27:06 -0700

updating docs from master


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/b29dfe4b
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/b29dfe4b
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/b29dfe4b

Branch: refs/heads/branch-1.1
Commit: b29dfe4ba852756995e5768ea25bcab11a01402c
Parents: ce1c3d2
Author: Nick Dimiduk <[email protected]>
Authored: Sat Aug 12 11:18:42 2017 -0700
Committer: Nick Dimiduk <[email protected]>
Committed: Sat Aug 12 11:24:21 2017 -0700

----------------------------------------------------------------------
 .../appendix_contributing_to_documentation.adoc |  2 +-
 src/main/asciidoc/_chapters/architecture.adoc   | 34 ++++++++
 src/main/asciidoc/_chapters/configuration.adoc  | 65 +++++++-------
 src/main/asciidoc/_chapters/datamodel.adoc      |  2 +-
 src/main/asciidoc/_chapters/developer.adoc      | 91 ++++++++++++++++++--
 .../asciidoc/_chapters/getting_started.adoc     | 24 +++---
 src/main/asciidoc/_chapters/hbase-default.adoc  | 42 ++++-----
 src/main/asciidoc/_chapters/ops_mgt.adoc        | 45 ++++++++++
 src/main/asciidoc/_chapters/preface.adoc        |  2 +-
 src/main/asciidoc/_chapters/protobuf.adoc       | 28 +++---
 src/main/asciidoc/_chapters/schema_design.adoc  |  7 +-
 11 files changed, 247 insertions(+), 95 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
----------------------------------------------------------------------
diff --git 
a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc 
b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
index 0d68dce..0337182 100644
--- a/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
+++ b/src/main/asciidoc/_chapters/appendix_contributing_to_documentation.adoc
@@ -55,7 +55,7 @@ see <<developer,developer>>.
 If you spot an error in a string in a UI, utility, script, log message, or 
elsewhere,
 or you think something could be made more clear, or you think text needs to be 
added
 where it doesn't currently exist, the first step is to file a JIRA. Be sure to 
set
-the component to `Documentation` in addition any other involved components. 
Most
+the component to `Documentation` in addition to any other involved components. 
Most
 components have one or more default owners, who monitor new issues which come 
into
 those queues. Regardless of whether you feel able to fix the bug, you should 
still
 file bugs where you see them.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 7f9ba07..ebb0677 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -244,6 +244,40 @@ For additional information on write durability, review the 
link:/acid-semantics.
 
 For fine-grained control of batching of ``Put``s or ``Delete``s, see the 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch%28java.util.List%29[batch]
 methods on Table.
 
+[[async.client]]
+=== Asynchronous Client ===
+
+It is a new API introduced in HBase 2.0 which aims to provide the ability to 
access HBase asynchronously.
+
+You can obtain an `AsyncConnection` from `ConnectionFactory`, and then get a 
asynchronous table instance from it to access HBase. When done, close the 
`AsyncConnection` instance(usually when your program exits).
+
+For the asynchronous table, most methods have the same meaning with the old 
`Table` interface, expect that the return value is wrapped with a 
CompletableFuture usually. We do not have any buffer here so there is no close 
method for asynchronous table, you do not need to close it. And it is thread 
safe.
+
+There are several differences for scan:
+
+* There is still a `getScanner` method which returns a `ResultScanner`. You 
can use it in the old way and it works like the old 
`ClientAsyncPrefetchScanner`.
+* There is a `scanAll` method which will return all the results at once. It 
aims to provide a simpler way for small scans which you want to get the whole 
results at once usually.
+* The Observer Pattern. There is a scan method which accepts a 
`ScanResultConsumer` as a parameter. It will pass the results to the consumer.
+
+Notice that there are two types of asynchronous table, one is `AsyncTable` and 
the other is `RawAsyncTable`.
+
+For `AsyncTable`, you need to provide a thread pool when getting it. The 
callbacks registered to the returned CompletableFuture will be executed in that 
thread pool. It is designed for normal users. You are free to do anything in 
the callbacks.
+
+For `RawAsyncTable`, all the callbacks are executed inside the framework 
thread so it is not allowed to do time consuming works in the callbacks 
otherwise you may block the framework thread and cause very bad performance 
impact. It is designed for advanced users who want to write high performance 
code. You can see the `org.apache.hadoop.hbase.client.example.HttpProxyExample` 
to see how to write fully asynchronous code with `RawAsyncTable`. And 
coprocessor related methods are only in `RawAsyncTable`.
+
+[[async.admin]]
+=== Asynchronous Admin ===
+
+You can obtain an `AsyncConnection` from `ConnectionFactory`, and then get a 
`AsyncAdmin` instance from it to access HBase. Notice that there are two 
`getAdmin` methods to get a `AsyncAdmin` instance. One method has one extra 
thread pool parameter which is used to execute callbacks. It is designed for 
normal users. Another method doesn't need a thread pool and all the callbacks 
are executed inside the framework thread so it is not allowed to do time 
consuming works in the callbacks. It is designed for advanced users.
+
+The default `getAdmin` methods will return a `AsyncAdmin` instance which use 
default configs. If you want to customize some configs, you can use 
`getAdminBuilder` methods to get a `AsyncAdminBuilder` for creating 
`AsyncAdmin` instance. Users are free to only set the configs they care about 
to create a new `AsyncAdmin` instance.
+
+For the `AsyncAdmin` interface, most methods have the same meaning with the 
old `Admin` interface, expect that the return value is wrapped with a 
CompletableFuture usually.
+
+For most admin operations, when the returned CompletableFuture is done, it 
means the admin operation has also been done. But for compact operation, it 
only means the compact request was sent to HBase and may need some time to 
finish the compact operation. For `rollWALWriter` method, it only means the 
rollWALWriter request was sent to the region server and may need some time to 
finish the `rollWALWriter` operation.
+
+For region name, we only accept `byte[]` as the parameter type and it may be a 
full region name or a encoded region name. For server name, we only accept 
`ServerName` as the parameter type. For table name, we only accept `TableName` 
as the parameter type. For `list*` operations, we only accept `Pattern` as the 
parameter type if you want to do regex matching.
+
 [[client.external]]
 === External Clients
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index ff4bf6a..bf14d11 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -79,11 +79,10 @@ To check for well-formedness and only print output if 
errors exist, use the comm
 .Keep Configuration In Sync Across the Cluster
 [WARNING]
 ====
-When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the content of the _conf/_ directory to all 
nodes of the cluster.
+When running in distributed mode, after you make an edit to an HBase 
configuration, make sure you copy the contents of the _conf/_ directory to all 
nodes of the cluster.
 HBase will not do this for you.
 Use `rsync`, `scp`, or another secure mechanism for copying the configuration 
files to your nodes.
-For most configuration, a restart is needed for servers to pick up changes An 
exception is dynamic configuration.
-to be described later below.
+For most configurations, a restart is needed for servers to pick up changes. 
Dynamic configuration is an exception to this, to be described later below.
 ====
 
 [[basic.prerequisites]]
@@ -131,11 +130,11 @@ DNS::
   HBase uses the local hostname to self-report its IP address. Both forward 
and reverse DNS resolving must work in versions of HBase previous to 0.92.0. 
The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool 
can be used to verify DNS is working correctly on the cluster. The project 
`README` file provides detailed instructions on usage.
 
 Loopback IP::
-  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer 
to `localhost`, and this could not be configured.
+  Prior to hbase-0.96.0, HBase only used the IP address `127.0.0.1` to refer 
to `localhost`, and this was not configurable.
   See <<loopback.ip,Loopback IP>> for more details.
 
 NTP::
-  The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism, on your cluster, and that all nodes look to the same service for 
time synchronization. See the 
link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP 
Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up 
NTP.
+  The clocks on cluster nodes should be synchronized. A small amount of 
variation is acceptable, but larger amounts of skew can cause erratic and 
unexpected behavior. Time synchronization is one of the first things to check 
if you see unexplained problems in your cluster. It is recommended that you run 
a Network Time Protocol (NTP) service, or another time-synchronization 
mechanism on your cluster and that all nodes look to the same service for time 
synchronization. See the 
link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP 
Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up 
NTP.
 
 [[ulimit]]
 Limits on Number of Files and Processes (ulimit)::
@@ -176,8 +175,8 @@ Linux Shell::
   All of the shell scripts that come with HBase rely on the 
link:http://www.gnu.org/software/bash[GNU Bash] shell.
 
 Windows::
-  Prior to HBase 0.96, testing for running HBase on Microsoft Windows was 
limited.
-  Running a on Windows nodes is not recommended for production systems.
+  Prior to HBase 0.96, running HBase on Microsoft Windows was limited only for 
testing purposes.
+  Running production systems on Windows machines is not recommended. 
 
 
 [[hadoop]]
@@ -261,8 +260,8 @@ Because HBase depends on Hadoop, it bundles an instance of 
the Hadoop jar under
 The bundled jar is ONLY for use in standalone mode.
 In distributed mode, it is _critical_ that the version of Hadoop that is out 
on your cluster match what is under HBase.
 Replace the hadoop jar found in the HBase lib directory with the hadoop jar 
you are running on your cluster to avoid version mismatch issues.
-Make sure you replace the jar in HBase everywhere on your cluster.
-Hadoop version mismatch issues have various manifestations but often all looks 
like its hung up.
+Make sure you replace the jar in HBase across your whole cluster.
+Hadoop version mismatch issues have various manifestations but often all look 
like its hung.
 ====
 
 [[dfs.datanode.max.transfer.threads]]
@@ -332,7 +331,7 @@ data must persist across node comings and goings. Writing to
 HDFS where data is replicated ensures the latter.
 
 To configure this standalone variant, edit your _hbase-site.xml_
-setting the _hbase.rootdir_ to point at a directory in your
+setting _hbase.rootdir_  to point at a directory in your
 HDFS instance but then set _hbase.cluster.distributed_
 to _false_. For example:
 
@@ -372,18 +371,18 @@ Some of the information that was originally in this 
section has been moved there
 ====
 
 A pseudo-distributed mode is simply a fully-distributed mode run on a single 
host.
-Use this configuration testing and prototyping on HBase.
-Do not use this configuration for production nor for evaluating HBase 
performance.
+Use this HBase configuration for testing and prototyping purposes only.
+Do not use this configuration for production or for performance evaluation.
 
 [[fully_dist]]
 === Fully-distributed
 
 By default, HBase runs in standalone mode.
 Both standalone mode and pseudo-distributed mode are provided for the purposes 
of small-scale testing.
-For a production environment, distributed mode is appropriate.
+For a production environment, distributed mode is advised.
 In distributed mode, multiple instances of HBase daemons run on multiple 
servers in the cluster.
 
-Just as in pseudo-distributed mode, a fully distributed configuration requires 
that you set the `hbase-cluster.distributed` property to `true`.
+Just as in pseudo-distributed mode, a fully distributed configuration requires 
that you set the `hbase.cluster.distributed` property to `true`.
 Typically, the `hbase.rootdir` is configured to point to a highly-available 
HDFS filesystem.
 
 In addition, the cluster is configured so that multiple cluster nodes enlist 
as RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers.
@@ -508,7 +507,7 @@ Just as in Hadoop where you add site-specific HDFS 
configuration to the _hdfs-si
 For the list of configurable properties, see 
<<hbase_default_configurations,hbase default configurations>> below or view the 
raw _hbase-default.xml_ source file in the HBase source code at 
_src/main/resources_.
 
 Not all configuration options make it out to _hbase-default.xml_.
-Configuration that it is thought rare anyone would change can exist only in 
code; the only way to turn up such configurations is via a reading of the 
source code itself.
+Some configurations would only appear in source code; the only way to identify 
these changes are through code review.
 
 Currently, changes here will require a cluster restart for HBase to notice the 
change.
 // hbase/src/main/asciidoc
@@ -543,11 +542,11 @@ If you are running HBase in standalone mode, you don't 
need to configure anythin
 Since the HBase Master may move around, clients bootstrap by looking to 
ZooKeeper for current critical locations.
 ZooKeeper is where all these values are kept.
 Thus clients require the location of the ZooKeeper ensemble before they can do 
anything else.
-Usually this the ensemble location is kept out in the _hbase-site.xml_ and is 
picked up by the client from the `CLASSPATH`.
+Usually this ensemble location is kept out in the _hbase-site.xml_ and is 
picked up by the client from the `CLASSPATH`.
 
 If you are configuring an IDE to run an HBase client, you should include the 
_conf/_ directory on your classpath so _hbase-site.xml_ settings can be found 
(or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
 
-Minimally, a client of HBase needs several libraries in its `CLASSPATH` when 
connecting to a cluster, including:
+Minimally, an HBase client needs several libraries in its `CLASSPATH` when 
connecting to a cluster, including:
 [source]
 ----
 
@@ -562,7 +561,7 @@ slf4j-log4j (slf4j-log4j12-1.5.8.jar)
 zookeeper (zookeeper-3.4.2.jar)
 ----
 
-An example basic _hbase-site.xml_ for client only might look as follows:
+A basic example _hbase-site.xml_ for client only may look as follows:
 [source,xml]
 ----
 <?xml version="1.0"?>
@@ -598,7 +597,7 @@ If multiple ZooKeeper instances make up your ZooKeeper 
ensemble, they may be spe
 
 === Basic Distributed HBase Install
 
-Here is an example basic configuration for a distributed ten node cluster:
+Here is a basic configuration example for a distributed ten node cluster:
 * The nodes are named `example0`, `example1`, etc., through node `example9` in 
this example.
 * The HBase Master and the HDFS NameNode are running on the node `example0`.
 * RegionServers run on nodes `example1`-`example9`.
@@ -709,10 +708,10 @@ See 
link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
 ===== `zookeeper.session.timeout`
 
 The default timeout is three minutes (specified in milliseconds). This means 
that if a server crashes, it will be three minutes before the Master notices 
the crash and starts recovery.
-You might like to tune the timeout down to a minute or even less so the Master 
notices failures the sooner.
-Before changing this value, be sure you have your JVM garbage collection 
configuration under control otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer (You might 
be fine with this -- you probably want recovery to start on the server if a 
RegionServer has been in GC for a long period of time).
+You might need to tune the timeout down to a minute or even less so the Master 
notices failures sooner.
+Before changing this value, be sure you have your JVM garbage collection 
configuration under control, otherwise, a long garbage collection that lasts 
beyond the ZooKeeper session timeout will take out your RegionServer. (You 
might be fine with this -- you probably want recovery to start on the server if 
a RegionServer has been in GC for a long period of time).
 
-To change this configuration, edit _hbase-site.xml_, copy the changed file 
around the cluster and restart.
+To change this configuration, edit _hbase-site.xml_, copy the changed file 
across the cluster and restart.
 
 We set this value high to save our having to field questions up on the mailing 
lists asking why a RegionServer went down during a massive import.
 The usual cause is that their JVM is untuned and they are running into long GC 
pauses.
@@ -728,14 +727,14 @@ See <<zookeeper,zookeeper>>.
 ==== HDFS Configurations
 
 [[dfs.datanode.failed.volumes.tolerated]]
-===== dfs.datanode.failed.volumes.tolerated
+===== `dfs.datanode.failed.volumes.tolerated`
 
 This is the "...number of volumes that are allowed to fail before a DataNode 
stops offering service.
 By default any volume failure will cause a datanode to shutdown" from the 
_hdfs-default.xml_ description.
 You might want to set this to about half the amount of your available disks.
 
-[[hbase.regionserver.handler.count_description]]
-==== `hbase.regionserver.handler.count`
+[[hbase.regionserver.handler.count]]
+===== `hbase.regionserver.handler.count`
 
 This setting defines the number of threads that are kept open to answer 
incoming requests to user tables.
 The rule of thumb is to keep this number low when the payload per request 
approaches the MB (big puts, scans using a large cache) and high when the 
payload is small (gets, small puts, ICVs, deletes). The total size of the 
queries in progress is limited by the setting 
`hbase.ipc.server.max.callqueue.size`.
@@ -751,7 +750,7 @@ You can get a sense of whether you have too little or too 
many handlers by <<rpc
 ==== Configuration for large memory machines
 
 HBase ships with a reasonable, conservative configuration that will work on 
nearly all machine types that people might want to test with.
-If you have larger machines -- HBase has 8G and larger heap -- you might the 
following configuration options helpful.
+If you have larger machines -- HBase has 8G and larger heap -- you might find 
the following configuration options helpful.
 TODO.
 
 [[config.compression]]
@@ -776,10 +775,10 @@ However, as all memstores are not expected to be full all 
the time, less WAL fil
 [[disable.splitting]]
 ==== Managed Splitting
 
-HBase generally handles splitting your regions, based upon the settings in 
your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
+HBase generally handles splitting of your regions based upon the settings in 
your _hbase-default.xml_ and _hbase-site.xml_          configuration files.
 Important settings include `hbase.regionserver.region.split.policy`, 
`hbase.hregion.max.filesize`, `hbase.regionserver.regionSplitLimit`.
 A simplistic view of splitting is that when a region grows to 
`hbase.hregion.max.filesize`, it is split.
-For most use patterns, most of the time, you should use automatic splitting.
+For most usage patterns, you should use automatic splitting.
 See <<manual_region_splitting_decisions,manual region splitting decisions>> 
for more information about manual region splitting.
 
 Instead of allowing HBase to split your regions automatically, you can choose 
to manage the splitting yourself.
@@ -805,8 +804,8 @@ It is better to err on the side of too few regions and 
perform rolling splits la
 The optimal number of regions depends upon the largest StoreFile in your 
region.
 The size of the largest StoreFile will increase with time if the amount of 
data grows.
 The goal is for the largest region to be just large enough that the compaction 
selection algorithm only compacts it during a timed major compaction.
-Otherwise, the cluster can be prone to compaction storms where a large number 
of regions under compaction at the same time.
-It is important to understand that the data growth causes compaction storms, 
and not the manual split decision.
+Otherwise, the cluster can be prone to compaction storms with a large number 
of regions under compaction at the same time.
+It is important to understand that the data growth causes compaction storms 
and not the manual split decision.
 
 If the regions are split into too many large regions, you can increase the 
major compaction interval by configuring `HConstants.MAJOR_COMPACTION_PERIOD`.
 HBase 0.90 introduced `org.apache.hadoop.hbase.util.RegionSplitter`, which 
provides a network-IO-safe rolling split of all regions.
@@ -866,9 +865,9 @@ You might also see the graphs on the tail of 
link:https://issues.apache.org/jira
 This section is about configurations that will make servers come back faster 
after a fail.
 See the Deveraj Das and Nicolas Liochon blog post 
link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction
 to HBase Mean Time to Recover (MTTR)] for a brief introduction.
 
-The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 
forces Namenode into loop with lease recovery requests] is messy but has a 
bunch of good discussion toward the end on low timeouts and how to effect 
faster recovery including citation of fixes added to HDFS. Read the Varun 
Sharma comments.
+The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 
forces Namenode into loop with lease recovery requests] is messy but has a 
bunch of good discussion toward the end on low timeouts and how to cause faster 
recovery including citation of fixes added to HDFS. Read the Varun Sharma 
comments.
 The below suggested configurations are Varun's suggestions distilled and 
tested.
-Make sure you are running on a late-version HDFS so you have the fixes he 
refers too and himself adds to HDFS that help HBase MTTR (e.g.
+Make sure you are running on a late-version HDFS so you have the fixes he 
refers to and himself adds to HDFS that help HBase MTTR (e.g.
 HDFS-3703, HDFS-3712, and HDFS-4791 -- Hadoop 2 for sure has them and late 
Hadoop 1 has some). Set the following in the RegionServer.
 
 [source,xml]
@@ -932,7 +931,7 @@ And on the NameNode/DataNode side, set the following to 
enable 'staleness' intro
 
 JMX (Java Management Extensions) provides built-in instrumentation that 
enables you to monitor and manage the Java VM.
 To enable monitoring and management from remote systems, you need to set 
system property `com.sun.management.jmxremote.port` (the port number through 
which you want to enable JMX RMI connections) when you start the Java VM.
-See the 
link:http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html[official
 documentation] for more information.
+See the 
link:http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html[official
 documentation] for more information.
 Historically, besides above port mentioned, JMX opens two additional random 
TCP listening ports, which could lead to port conflict problem. (See 
link:https://issues.apache.org/jira/browse/HBASE-10289[HBASE-10289] for details)
 
 As an alternative, You can use the coprocessor-based JMX implementation 
provided by HBase.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc 
b/src/main/asciidoc/_chapters/datamodel.adoc
index 30465fb..da4143a 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -97,7 +97,7 @@ The colon character (`:`) delimits the column family from the 
column family _qua
 |"com.cnn.www" |t6  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t5  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t3  | contents:html = "<html>..."    | |
-|"com.example.www"| t5  | contents:html = "<html>..."   | people:author = 
"John Doe"
+|"com.example.www"| t5  | contents:html = "<html>..."    | | people:author = 
"John Doe"
 |===
 
 Cells in this table that appear to be empty do not take space, or in fact 
exist, in HBase.

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/developer.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/developer.adoc 
b/src/main/asciidoc/_chapters/developer.adoc
index 50b9c74..6a546fb 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -33,7 +33,7 @@ Being familiar with these guidelines will help the HBase 
committers to use your
 [[getting.involved]]
 == Getting Involved
 
-Apache HBase gets better only when people contribute! If you are looking to 
contribute to Apache HBase, look for 
link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)[issues
 in JIRA tagged with the label 'beginner'].
+Apache HBase gets better only when people contribute! If you are looking to 
contribute to Apache HBase, look for 
link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues
 in JIRA tagged with the label 'beginner'].
 These are issues HBase contributors have deemed worthy but not of immediate 
priority and a good way to ramp on HBase internals.
 See link:http://search-hadoop.com/m/DHED43re96[What label
                 is used for issues that are good on ramps for new 
contributors?] from the dev mailing list for background.
@@ -67,13 +67,90 @@ FreeNode offers a web-based client, but most people prefer 
a native client, and
 Check for existing issues in 
link:https://issues.apache.org/jira/browse/HBASE[Jira].
 If it's either a new feature request, enhancement, or a bug, file a ticket.
 
-To check for existing issues which you can tackle as a beginner, search for 
link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)[issues
 in JIRA tagged with the label 'beginner'].
+We track multiple types of work in JIRA:
 
-* .JIRA PrioritiesBlocker: Should only be used if the issue WILL cause data 
loss or cluster instability reliably.
-* Critical: The issue described can cause data loss or cluster instability in 
some cases.
-* Major: Important but not tragic issues, like updates to the client API that 
will add a lot of much-needed functionality or significant bugs that need to be 
fixed but that don't cause data loss.
-* Minor: Useful enhancements and annoying but not damaging bugs.
-* Trivial: Useful enhancements but generally cosmetic.
+- Bug: Something is broken in HBase itself.
+- Test: A test is needed, or a test is broken.
+- New feature: You have an idea for new functionality. It's often best to bring
+  these up on the mailing lists first, and then write up a design specification
+  that you add to the feature request JIRA.
+- Improvement: A feature exists, but could be tweaked or augmented. It's often
+  best to bring these up on the mailing lists first and have a discussion, then
+  summarize or link to the discussion if others seem interested in the
+  improvement.
+- Wish: This is like a new feature, but for something you may not have the
+  background to flesh out yourself.
+
+Bugs and tests have the highest priority and should be actionable.
+
+==== Guidelines for reporting effective issues
+
+- *Search for duplicates*: Your issue may have already been reported. Have a
+  look, realizing that someone else might have worded the summary differently.
++
+Also search the mailing lists, which may have information about your problem
+and how to work around it. Don't file an issue for something that has already
+been discussed and resolved on a mailing list, unless you strongly disagree
+with the resolution *and* are willing to help take the issue forward.
+
+* *Discuss in public*: Use the mailing lists to discuss what you've discovered
+  and see if there is something you've missed. Avoid using back channels, so
+  that you benefit from the experience and expertise of the project as a whole.
+
+* *Don't file on behalf of others*: You might not have all the context, and you
+  don't have as much motivation to see it through as the person who is actually
+  experiencing the bug. It's more helpful in the long term to encourage others
+  to file their own issues. Point them to this material and offer to help out
+  the first time or two.
+
+* *Write a good summary*: A good summary includes information about the 
problem,
+  the impact on the user or developer, and the area of the code.
+** Good: `Address new license dependencies from hadoop3-alpha4`
+** Room for improvement: `Canary is broken`
++
+If you write a bad title, someone else will rewrite it for you. This is time
+they could have spent working on the issue instead.
+
+* *Give context in the description*: It can be good to think of this in 
multiple
+  parts:
+** What happens or doesn't happen?
+** How does it impact you?
+** How can someone else reproduce it?
+** What would "fixed" look like?
++
+You don't need to know the answers for all of these, but give as much
+information as you can. If you can provide technical information, such as a
+Git commit SHA that you think might have caused the issue or a build failure
+on builds.apache.org where you think the issue first showed up, share that
+info.
+
+* *Fill in all relevant fields*: These fields help us filter, categorize, and
+  find things.
+
+* *One bug, one issue, one patch*: To help with back-porting, don't split 
issues
+  or fixes among multiple bugs.
+
+* *Add value if you can*: Filing issues is great, even if you don't know how to
+  fix them. But providing as much information as possible, being willing to
+  triage and answer questions, and being willing to test potential fixes is 
even
+  better! We want to fix your issue as quickly as you want it to be fixed.
+
+* *Don't be upset if we don't fix it*: Time and resources are finite. In some
+  cases, we may not be able to (or might choose not to) fix an issue, 
especially
+  if it is an edge case or there is a workaround. Even if it doesn't get fixed,
+  the JIRA is a public record of it, and will help others out if they run into
+  a similar issue in the future.
+
+==== Working on an issue
+
+To check for existing issues which you can tackle as a beginner, search for 
link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues
 in JIRA tagged with the label 'beginner'].
+
+.JIRA Priorites
+* *Blocker*: Should only be used if the issue WILL cause data loss or cluster 
instability reliably.
+* *Critical*: The issue described can cause data loss or cluster instability 
in some cases.
+* *Major*: Important but not tragic issues, like updates to the client API 
that will add a lot of much-needed functionality or significant bugs that need 
to be fixed but that don't cause data loss.
+* *Minor*: Useful enhancements and annoying but not damaging bugs.
+* *Trivial*: Useful enhancements but generally cosmetic.
 
 .Code Blocks in Jira Comments
 ====

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc 
b/src/main/asciidoc/_chapters/getting_started.adoc
index 4ffae6d..0e50273 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -145,7 +145,7 @@ NOTE: Java needs to be installed and available.
 If you get an error indicating that Java is not installed,
 but it is on your system, perhaps in a non-standard location,
 edit the _conf/hbase-env.sh_ file and modify the `JAVA_HOME`
-setting to point to the directory that contains _bin/java_ your system.
+setting to point to the directory that contains _bin/java_ on your system.
 
 
 [[shell_exercises]]
@@ -320,8 +320,7 @@ This procedure will create a totally new directory where 
HBase will store its da
 . Configure HBase.
 +
 Edit the _hbase-site.xml_ configuration.
-First, add the following property.
-which directs HBase to run in distributed mode, with one JVM instance per 
daemon.
+First, add the following property which directs HBase to run in distributed 
mode, with one JVM instance per daemon.
 +
 [source,xml]
 ----
@@ -494,15 +493,14 @@ $ cat id_rsa.pub >> ~/.ssh/authorized_keys
 
 . Test password-less login.
 +
-If you performed the procedure correctly, if you SSH from `node-a` to either 
of the other nodes, using the same username, you should not be prompted for a 
password.
+If you performed the procedure correctly, you should not be prompted for a 
password when you SSH from `node-a` to either of the other nodes using the same 
username.
 
 . Since `node-b` will run a backup Master, repeat the procedure above, 
substituting `node-b` everywhere you see `node-a`.
   Be sure not to overwrite your existing _.ssh/authorized_keys_ files, but 
concatenate the new key onto the existing file using the `>>` operator rather 
than the `>` operator.
 
 .Procedure: Prepare `node-a`
 
-`node-a` will run your primary master and ZooKeeper processes, but no 
RegionServers.
-. Stop the RegionServer from starting on `node-a`.
+`node-a` will run your primary master and ZooKeeper processes, but no 
RegionServers. Stop the RegionServer from starting on `node-a`.
 
 . Edit _conf/regionservers_ and remove the line which contains `localhost`. 
Add lines with the hostnames or IP addresses for `node-b` and `node-c`.
 +
@@ -519,7 +517,7 @@ In this demonstration, the hostname is `node-b.example.com`.
 . Configure ZooKeeper
 +
 In reality, you should carefully consider your ZooKeeper configuration.
-You can find out more about configuring ZooKeeper in <<zookeeper,zookeeper>>.
+You can find out more about configuring ZooKeeper in <<zookeeper,zookeeper>> 
section.
 This configuration will direct HBase to start and manage a ZooKeeper instance 
on each node of the cluster.
 +
 On `node-a`, edit _conf/hbase-site.xml_ and add the following properties.
@@ -607,7 +605,7 @@ $ jps
 ----
 ====
 +
-.`node-a` `jps` Output
+.`node-c` `jps` Output
 ====
 ----
 $ jps
@@ -621,9 +619,9 @@ $ jps
 [NOTE]
 ====
 The `HQuorumPeer` process is a ZooKeeper instance which is controlled and 
started by HBase.
-If you use ZooKeeper this way, it is limited to one instance per cluster node, 
, and is appropriate for testing only.
+If you use ZooKeeper this way, it is limited to one instance per cluster node 
and is appropriate for testing only.
 If ZooKeeper is run outside of HBase, the process is called `QuorumPeer`.
-For more about ZooKeeper configuration, including using an external ZooKeeper 
instance with HBase, see <<zookeeper,zookeeper>>.
+For more about ZooKeeper configuration, including using an external ZooKeeper 
instance with HBase, see <<zookeeper,zookeeper>> section.
 ====
 
 . Browse to the Web UI.
@@ -637,15 +635,15 @@ Master and 60030 for each RegionServer to 16010 for the 
Master and 16030 for the
 +
 If everything is set up correctly, you should be able to connect to the UI for 
the Master
 `http://node-a.example.com:16010/` or the secondary master at 
`http://node-b.example.com:16010/`
-for the secondary master, using a web browser.
+ using a web browser.
 If you can connect via `localhost` but not from another host, check your 
firewall rules.
 You can see the web UI for each of the RegionServers at port 16030 of their IP 
addresses, or by
 clicking their links in the web UI for the Master.
 
 . Test what happens when nodes or services disappear.
 +
-With a three-node cluster like you have configured, things will not be very 
resilient.
-Still, you can test what happens when the primary Master or a RegionServer 
disappears, by killing the processes and watching the logs.
+With a three-node cluster you have configured, things will not be very 
resilient.
+You can still test the behavior of the primary Master or a RegionServer by 
killing the associated processes and watching the logs.
 
 
 === Where to go next

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/hbase-default.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc 
b/src/main/asciidoc/_chapters/hbase-default.adoc
index 60c0849..6b11945 100644
--- a/src/main/asciidoc/_chapters/hbase-default.adoc
+++ b/src/main/asciidoc/_chapters/hbase-default.adoc
@@ -57,7 +57,7 @@ The directory shared by region servers and into
     HDFS directory '/hbase' where the HDFS instance's namenode is
     running at namenode.example.org on port 9000, set this value to:
     hdfs://namenode.example.org:9000/hbase.  By default, we write
-    to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
+    to whatever ${hbase.tmp.dir} is set to -- usually /tmp --
     so change this configuration or else all data will be lost on
     machine restart.
 +
@@ -72,7 +72,7 @@ The directory shared by region servers and into
 The mode the cluster will be in. Possible values are
       false for standalone mode and true for distributed mode.  If
       false, startup will run all HBase and ZooKeeper daemons together
-      in the one JVM.
+      in one JVM.
 +
 .Default
 `false`
@@ -87,11 +87,11 @@ Comma separated list of servers in the ZooKeeper ensemble
     For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
     By default this is set to localhost for local and pseudo-distributed modes
     of operation. For a fully-distributed setup, this should be set to a full
-    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in 
hbase-env.sh
+    list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in 
hbase-env.sh,
     this is the list of servers which hbase will start/stop ZooKeeper on as
     part of cluster start/stop.  Client-side, we will take this list of
     ensemble members and put it together with the hbase.zookeeper.clientPort
-    config. and pass it into zookeeper constructor as the connectString
+    config and pass it into zookeeper constructor as the connectString
     parameter.
 +
 .Default
@@ -259,7 +259,7 @@ Factor to determine the number of call queues.
 Split the call queues into read and write queues.
       The specified interval (which should be between 0.0 and 1.0)
       will be multiplied by the number of call queues.
-      A value of 0 indicate to not split the call queues, meaning that both 
read and write
+      A value of 0 indicates to not split the call queues, meaning that both 
read and write
       requests will be pushed to the same set of queues.
       A value lower than 0.5 means that there will be less read queues than 
write queues.
       A value of 0.5 means there will be the same number of read and write 
queues.
@@ -292,7 +292,7 @@ Given the number of read call queues, calculated from the 
total number
       A value lower than 0.5 means that there will be less long-read queues 
than short-read queues.
       A value of 0.5 means that there will be the same number of short-read 
and long-read queues.
       A value greater than 0.5 means that there will be more long-read queues 
than short-read queues
-      A value of 0 or 1 indicate to use the same set of queues for gets and 
scans.
+      A value of 0 or 1 indicates to use the same set of queues for gets and 
scans.
 
       Example: Given the total number of read call queues being 8
       a scan.ratio of 0 or 1 means that: 8 queues will contain both long and 
short read requests.
@@ -412,7 +412,7 @@ Maximum size of all memstores in a region server before new
 .Description
 Maximum size of all memstores in a region server before flushes are forced.
       Defaults to 95% of hbase.regionserver.global.memstore.size.
-      A 100% value for this value causes the minimum possible flushing to 
occur when updates are
+      A 100% value for this property causes the minimum possible flushing to 
occur when updates are
       blocked due to memstore limiting.
 +
 .Default
@@ -704,7 +704,7 @@ The maximum number of concurrent tasks a single HTable 
instance will
 The maximum number of concurrent connections the client will
     maintain to a single Region. That is, if there is already
     hbase.client.max.perregion.tasks writes in progress for this region, new 
puts
-    won't be sent to this region until some writes finishes.
+    won't be sent to this region until some writes finish.
 +
 .Default
 `1`
@@ -764,8 +764,8 @@ Client scanner lease period in milliseconds.
 *`hbase.bulkload.retries.number`*::
 +
 .Description
-Maximum retries.  This is maximum number of iterations
-    to atomic bulk loads are attempted in the face of splitting operations
+Maximum retries. This is a maximum number of iterations
+    atomic bulk loads are attempted in the face of splitting operations,
     0 means never give up.
 +
 .Default
@@ -1322,10 +1322,10 @@ This is for the RPC layer to define how long HBase 
client applications
 *`hbase.rpc.shortoperation.timeout`*::
 +
 .Description
-This is another version of "hbase.rpc.timeout". For those RPC operation
+This is another version of "hbase.rpc.timeout". For those RPC operations
         within cluster, we rely on this configuration to set a short timeout 
limitation
-        for short operation. For example, short rpc timeout for region 
server's trying
-        to report to active master can benefit quicker master failover process.
+        for short operations. For example, short rpc timeout for region server 
trying
+        to report to active master can benefit from quicker master failover 
process.
 +
 .Default
 `10000`
@@ -1336,7 +1336,7 @@ This is another version of "hbase.rpc.timeout". For those 
RPC operation
 +
 .Description
 Set no delay on rpc socket connections.  See
-    
http://docs.oracle.com/javase/1.5.0/docs/api/java/net/Socket.html#getTcpNoDelay()
+    
http://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#getTcpNoDelay--
 +
 .Default
 `true`
@@ -1766,10 +1766,10 @@ How long we wait on dfs lease recovery in total before 
giving up.
 *`hbase.lease.recovery.dfs.timeout`*::
 +
 .Description
-How long between dfs recover lease invocations. Should be larger than the sum 
of
+How long between dfs recovery lease invocations. Should be larger than the sum 
of
         the time it takes for the namenode to issue a block recovery command 
as part of
-        datanode; dfs.heartbeat.interval and the time it takes for the primary
-        datanode, performing block recovery to timeout on a dead datanode; 
usually
+        datanode dfs.heartbeat.interval and the time it takes for the primary
+        datanode performing block recovery to timeout on a dead datanode, 
usually
         dfs.client.socket-timeout. See the end of HBASE-8389 for more.
 +
 .Default
@@ -2080,7 +2080,7 @@ Fully qualified name of class implementing coordinated 
state manager.
       be initialized. Then, the Filter will be applied to all user facing jsp
       and servlet web pages.
       The ordering of the list defines the ordering of the filters.
-      The default StaticUserWebFilter add a user principal as defined by the
+      The default StaticUserWebFilter adds a user principal as defined by the
       hbase.http.staticuser.user property.
 
 +
@@ -2135,8 +2135,8 @@ Fully qualified name of class implementing coordinated 
state manager.
 +
 .Description
 
-      The user name to filter as, on static web filters
-      while rendering content. An example use is the HDFS
+      The user name to filter as on static web filters
+      while rendering content. For example, the HDFS
       web UI (user to be used for browsing files).
 
 +
@@ -2151,7 +2151,7 @@ Fully qualified name of class implementing coordinated 
state manager.
 The percent of region server RPC threads failed to abort RS.
     -1 Disable aborting; 0 Abort if even a single handler has died;
     0.x Abort only when this percent of handlers have died;
-    1 Abort only all of the handers have died.
+    1 Abort only all of the handlers have died.
 +
 .Default
 `0.5`

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index b26e44b..6181b13 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -1964,6 +1964,51 @@ In these cases, the user may configure the system to not 
delete any space quota
   </property>
 ----
 
+=== HBase Snapshots with Space Quotas
+
+One common area of unintended-filesystem-use with HBase is via HBase 
snapshots. Because snapshots
+exist outside of the management of HBase tables, it is not uncommon for 
administrators to suddenly
+realize that hundreds of gigabytes or terabytes of space is being used by 
HBase snapshots which were
+forgotten and never removed.
+
+link:https://issues.apache.org/jira/browse/HBASE-17748[HBASE-17748] is the 
umbrella JIRA issue which
+expands on the original space quota functionality to also include HBase 
snapshots. While this is a confusing
+subject, the implementation attempts to present this support in as reasonable 
and simple of a manner as
+possible for administrators. This feature does not make any changes to 
administrator interaction with
+space quotas, only in the internal computation of table/namespace usage. Table 
and namespace usage will
+automatically incorporate the size taken by a snapshot per the rules defined 
below.
+
+As a review, let's cover a snapshot's lifecycle: a snapshot is metadata which 
points to
+a list of HFiles on the filesystem. This is why creating a snapshot is a very 
cheap operation; no HBase
+table data is actually copied to perform a snapshot. Cloning a snapshot into a 
new table or restoring
+a table is a cheap operation for the same reason; the new table references the 
files which already exist
+on the filesystem without a copy. To include snapshots in space quotas, we 
need to define which table
+"owns" a file when a snapshot references the file ("owns" refers to 
encompassing the filesystem usage
+of that file).
+
+Consider a snapshot which was made against a table. When the snapshot refers 
to a file and the table no
+longer refers to that file, the "originating" table "owns" that file. When 
multiple snapshots refer to
+the same file and no table refers to that file, the snapshot with the 
lowest-sorting name (lexicographically)
+is chosen and the table which that snapshot was created from "owns" that file. 
HFiles are not "double-counted"
+ hen a table and one or more snapshots refer to that HFile.
+
+When a table is "rematerialized" (via `clone_snapshot` or `restore_snapshot`), 
a similar problem of file
+ownership arises. In this case, while the rematerialized table references a 
file which a snapshot also
+references, the table does not "own" the file. The table from which the 
snapshot was created still "owns"
+that file. When the rematerialized table is compacted or the snapshot is 
deleted, the rematerialized table
+will uniquely refer to a new file and "own" the usage of that file. Similarly, 
when a table is duplicated via a snapshot
+and `restore_snapshot`, the new table will not consume any quota size until 
the original table stops referring
+to the files, either due to a compaction on the original table, a compaction 
on the new table, or the
+original table being deleted.
+
+One new HBase shell command was added to inspect the computed sizes of each 
snapshot in an HBase instance.
+
+----
+hbase> list_snapshot_sizes
+SNAPSHOT                                      SIZE
+ t1.s1                                        1159108
+----
+
 [[ops.backup]]
 == HBase Backup
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/preface.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/preface.adoc 
b/src/main/asciidoc/_chapters/preface.adoc
index 7d244bd..ed2ca7a 100644
--- a/src/main/asciidoc/_chapters/preface.adoc
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -99,7 +99,7 @@ Tested::
 
 Not Tested::
   In the context of Apache HBase, /not tested/ means that a feature or use 
pattern
-  may or may notwork in a given way, and may or may not corrupt your data or 
cause
+  may or may not work in a given way, and may or may not corrupt your data or 
cause
   operational issues. It is an unknown, and there are no guarantees. If you 
can provide
   proof that a feature designated as /not tested/ does work in a given way, 
please
   submit the tests and/or the metrics so that other users can gain certainty 
about

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/protobuf.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/protobuf.adoc 
b/src/main/asciidoc/_chapters/protobuf.adoc
index 1c2cc47..8c73dd0 100644
--- a/src/main/asciidoc/_chapters/protobuf.adoc
+++ b/src/main/asciidoc/_chapters/protobuf.adoc
@@ -31,7 +31,7 @@
 == Protobuf
 HBase uses Google's link:http://protobuf.protobufs[protobufs] wherever
 it persists metadata -- in the tail of hfiles or Cells written by
-HBase into the system hbase;meta table or when HBase writes znodes
+HBase into the system hbase:meta table or when HBase writes znodes
 to zookeeper, etc. -- and when it passes objects over the wire making
 xref:hbase.rpc[RPCs]. HBase uses protobufs to describe the RPC
 Interfaces (Services) we expose to clients, for example the `Admin` and 
`Client`
@@ -48,15 +48,15 @@ You then feed these descriptors to a protobuf tool, the 
`protoc` binary,
 to generate classes that can marshall and unmarshall the described 
serializations
 and field the specified Services.
 
-See the `README.txt` in the HBase sub-modules for detail on how
+See the `README.txt` in the HBase sub-modules for details on how
 to run the class generation on a per-module basis;
-e.g. see `hbase-protocol/README.txt` for how to generated protobuf classes
+e.g. see `hbase-protocol/README.txt` for how to generate protobuf classes
 in the hbase-protocol module.
 
-In HBase, `.proto` files are either in the `hbase-protocol` module, a module
+In HBase, `.proto` files are either in the `hbase-protocol` module; a module
 dedicated to hosting the common proto files and the protoc generated classes
-that HBase uses internally serializing metadata or, for extensions to hbase
-such as REST or Coprocessor Endpoints that need their own descriptors, their
+that HBase uses internally serializing metadata. For extensions to hbase
+such as REST or Coprocessor Endpoints that need their own descriptors; their
 protos are located inside the function's hosting module: e.g. `hbase-rest`
 is home to the REST proto files and the `hbase-rsgroup` table grouping
 Coprocessor Endpoint has all protos that have to do with table grouping.
@@ -71,7 +71,7 @@ of core HBase protos found back in the hbase-protocol module. 
They'll
 use these core protos when they want to serialize a Cell or a Put or
 refer to a particular node via ServerName, etc., as part of providing the
 CPEP Service. Going forward, after the release of hbase-2.0.0, this
-practice needs to whither. We'll make plain why in the later
+practice needs to whither. We'll explain why in the later
 xref:shaded.protobuf[hbase-2.0.0] section.
 
 [[shaded.protobuf]]
@@ -87,8 +87,8 @@ so hbase core can evolve its protobuf version independent of 
whatever our
 dependencies rely on. For instance, HDFS serializes using protobuf.
 HDFS is on our CLASSPATH. Without the above described indirection, our
 protobuf versions would have to align. HBase would be stuck
-on the HDFS protobuf version until HDFS decided upgrade. HBase
-and HDFS verions would be tied.
+on the HDFS protobuf version until HDFS decided to upgrade. HBase
+and HDFS versions would be tied.
 
 We had to move on from protobuf-2.5.0 because we need facilities
 added in protobuf-3.1.0; in particular being able to save on
@@ -98,10 +98,8 @@ serialization/deserialization.
 In hbase-2.0.0, we introduced a new module, `hbase-protocol-shaded`
 inside which we contained all to do with protobuf and its subsequent
 relocation/shading. This module is in essence a copy of much of the old
-`hbase-protocol` but with an extra shading/relocation step (see the 
`README.txt`
-and the `poms.xml` in this module for more on how to trigger this
-effect and how it all works). Core was moved to depend on this new
-module.
+`hbase-protocol` but with an extra shading/relocation step.
+Core was moved to depend on this new module.
 
 That said, a complication arises around Coprocessor Endpoints (CPEPs).
 CPEPs depend on public HBase APIs that reference protobuf classes at
@@ -127,9 +125,7 @@ HBase needs to be able to deal with both
 `org.apache.hadoop.hbase.shaded.com.google.protobuf.*` protobufs.
 
 The `hbase-protocol-shaded` module hosts all
-protobufs used by HBase core as well as the internal shaded version of
-protobufs that hbase depends on. hbase-client and hbase-server, etc.,
-depend on this module.
+protobufs used by HBase core.
 
 But for the vestigial CPEP references to the (non-shaded) content of
 `hbase-protocol`, we keep around most of this  module going forward

http://git-wip-us.apache.org/repos/asf/hbase/blob/b29dfe4b/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 7b85d15..cef05f2 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -40,6 +40,9 @@ any quoted values by ~10 to get what works for HBase: e.g. 
where it says individ
 to go smaller if you can -- and where it says a maximum of 100 column families 
in Cloud Bigtable, think ~10 when
 modeling on HBase.
 
+See also Robert Yokota's 
link:https://blogs.apache.org/hbase/entry/hbase-application-archetypes-redux[HBase
 Application Archetypes]
+(an update on work done by other HBasers), for a helpful categorization of use 
cases that do well on top of the HBase model.
+
 
 [[schema.creation]]
 ==  Schema Creation
@@ -748,7 +751,7 @@ This approach would be useful if scanning by hostname was a 
priority.
 [[schema.casestudies.log_timeseries.revts]]
 ==== Timestamp, or Reverse Timestamp?
 
-If the most important access path is to pull most recent events, then storing 
the timestamps as reverse-timestamps (e.g., `timestamp = Long.MAX_VALUE â 
timestamp`) will create the property of being able to do a Scan on 
`[hostname][log-event]` to obtain the quickly obtain the most recently captured 
events.
+If the most important access path is to pull most recent events, then storing 
the timestamps as reverse-timestamps (e.g., `timestamp = Long.MAX_VALUE â 
timestamp`) will create the property of being able to do a Scan on 
`[hostname][log-event]` to obtain the most recently captured events.
 
 Neither approach is wrong, it just depends on what is most appropriate for the 
situation.
 
@@ -1152,7 +1155,7 @@ Detect regionserver failure as fast as reasonable. Set 
the following parameters:
 - `dfs.client.read.shortcircuit = true`
 - `dfs.client.read.shortcircuit.buffer.size = 131072` (Important to avoid OOME)
 * Ensure data locality. In `hbase-site.xml`, set 
`hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n 
\<= 1)
-* Make sure DataNodes have enough handlers for block transfers. In 
`hdfs-site`.xml``, set the following parameters:
+* Make sure DataNodes have enough handlers for block transfers. In 
`hdfs-site.xml`, set the following parameters:
 - `dfs.datanode.max.xcievers >= 8192`
 - `dfs.datanode.handler.count =` number of spindles

[2/2] hbase git commit: updating docs from master

Reply via email to