[02/11] hbase git commit: HBASE-13908 update site docs for 1.2 RC.

busbey Sun, 03 Jan 2016 03:20:13 -0800

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/schema_design.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/schema_design.adoc 
b/src/main/asciidoc/_chapters/schema_design.adoc
index 9319c65..5cf8d12 100644
--- a/src/main/asciidoc/_chapters/schema_design.adoc
+++ b/src/main/asciidoc/_chapters/schema_design.adoc
@@ -27,7 +27,19 @@
 :icons: font
 :experimental:
 
-A good general introduction on the strength and weaknesses modelling on the 
various non-rdbms datastores is Ian Varley's Master thesis, 
link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No 
Relation: The Mixed Blessings of Non-Relational Databases]. Also, read 
<<keyvalue,keyvalue>> for how HBase stores data internally, and the section on 
<<schema.casestudies,schema.casestudies>>.
+A good introduction on the strength and weaknesses modelling on the various 
non-rdbms datastores is
+to be found in Ian Varley's Master thesis,
+link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No 
Relation: The Mixed Blessings of Non-Relational Databases].
+It is a little dated now but a good background read if you have a moment on 
how HBase schema modeling
+differs from how it is done in an RDBMS. Also,
+read <<keyvalue,keyvalue>> for how HBase stores data internally, and the 
section on <<schema.casestudies,schema.casestudies>>.
+
+The documentation on the Cloud Bigtable website, 
link:https://cloud.google.com/bigtable/docs/schema-design[Designing Your 
Schema],
+is pertinent and nicely done and lessons learned there equally apply here in 
HBase land; just divide
+any quoted values by ~10 to get what works for HBase: e.g. where it says 
individual values can be ~10MBs in size, HBase can do similar -- perhaps best
+to go smaller if you can -- and where it says a maximum of 100 column families 
in Cloud Bigtable, think ~10 when
+modeling on HBase.
+
 
 [[schema.creation]]
 ==  Schema Creation
@@ -41,7 +53,7 @@ Tables must be disabled when making ColumnFamily 
modifications, for example:
 
 Configuration config = HBaseConfiguration.create();
 Admin admin = new Admin(conf);
-String table = "myTable";
+TableName table = TableName.valueOf("myTable");
 
 admin.disableTable(table);
 
@@ -64,6 +76,50 @@ When changes are made to either Tables or ColumnFamilies 
(e.g. region size, bloc
 
 See <<store,store>> for more information on StoreFiles.
 
+[[table_schema_rules_of_thumb]]
+== Table Schema Rules Of Thumb
+
+There are many different data sets, with different access patterns and 
service-level
+expectations. Therefore, these rules of thumb are only an overview. Read the 
rest
+of this chapter to get more details after you have gone through this list.
+
+* Aim to have regions sized between 10 and 50 GB.
+* Aim to have cells no larger than 10 MB, or 50 MB if you use <<mob>>. 
Otherwise,
+consider storing your cell data in HDFS and store a pointer to the data in 
HBase.
+* A typical schema has between 1 and 3 column families per table. HBase tables 
should
+not be designed to mimic RDBMS tables.
+* Around 50-100 regions is a good number for a table with 1 or 2 column 
families.
+Remember that a region is a contiguous segment of a column family.
+* Keep your column family names as short as possible. The column family names 
are
+stored for every value (ignoring prefix encoding). They should not be 
self-documenting
+and descriptive like in a typical RDBMS.
+* If you are storing time-based machine data or logging information, and the 
row key
+is based on device ID or service ID plus time, you can end up with a pattern 
where
+older data regions never have additional writes beyond a certain age. In this 
type
+of situation, you end up with a small number of active regions and a large 
number
+of older regions which have no new writes. For these situations, you can 
tolerate
+a larger number of regions because your resource consumption is driven by the 
active
+regions only.
+* If only one column family is busy with writes, only that column family 
accomulates
+memory. Be aware of write patterns when allocating resources.
+
+[[regionserver_sizing_rules_of_thumb]]
+= RegionServer Sizing Rules of Thumb
+
+Lars Hofhansl wrote a great
+link:http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html[blog
 post]
+about RegionServer memory sizing. The upshot is that you probably need more 
memory
+than you think you need. He goes into the impact of region size, memstore 
size, HDFS
+replication factor, and other things to check.
+
+[quote, Lars Hofhansl, 
http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html]
+____
+Personally I would place the maximum disk space per machine that can be served
+exclusively with HBase around 6T, unless you have a very read-heavy workload.
+In that case the Java heap should be 32GB (20G regions, 128M memstores, the 
rest
+defaults).
+____
+
 [[number.of.cfs]]
 ==  On the number of column families
 
@@ -175,7 +231,7 @@ See this comic by IKai Lan on why monotonically increasing 
row keys are problema
 The pile-up on a single region brought on by monotonically increasing keys can 
be mitigated by randomizing the input records to not be in sorted order, but in 
general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as 
the row-key.
 
 If you do need to upload time series data into HBase, you should study 
link:http://opentsdb.net/[OpenTSDB] as a successful example.
-It has a page describing the link: http://opentsdb.net/schema.html[schema] it 
uses in HBase.
+It has a page describing the link:http://opentsdb.net/schema.html[schema] it 
uses in HBase.
 The key format in OpenTSDB is effectively [metric_type][event_timestamp], 
which would appear at first glance to contradict the previous advice about not 
using a timestamp as the key.
 However, the difference is that the timestamp is not in the _lead_ position of 
the key, and the design assumption is that there are dozens or hundreds (or 
more) of different metric types.
 Thus, even with a continual stream of input data with a mix of metric types, 
the Puts are distributed across various points of regions in the table.
@@ -327,8 +383,8 @@ As an example of why this is important, consider the 
example of using displayabl
 
 The problem is that all the data is going to pile up in the first 2 regions 
and the last region thus creating a "lumpy" (and possibly "hot") region problem.
 To understand why, refer to an link:http://www.asciitable.com[ASCII Table].
-'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values 
(bytes 58 to 96) that will _never appear in this keyspace_ because the only 
values are [0-9] and [a-f]. Thus, the middle regions regions will never be used.
-To make pre-spliting work with this example keyspace, a custom definition of 
splits (i.e., and not relying on the built-in split method) is required.
+'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values 
(bytes 58 to 96) that will _never appear in this keyspace_ because the only 
values are [0-9] and [a-f]. Thus, the middle regions will never be used.
+To make pre-splitting work with this example keyspace, a custom definition of 
splits (i.e., and not relying on the built-in split method) is required.
 
 Lesson #1: Pre-splitting tables is generally a best practice, but you need to 
pre-split them in such a way that all the regions are accessible in the 
keyspace.
 While this example demonstrated the problem with a hex-key keyspace, the same 
problem can happen with _any_ keyspace.
@@ -394,7 +450,7 @@ The minimum number of row versions parameter is used 
together with the time-to-l
 HBase supports a "bytes-in/bytes-out" interface via 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put]
 and 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result],
 so anything that can be converted to an array of bytes can be stored as a 
value.
 Input could be strings, numbers, complex objects, or even images as long as 
they can rendered as bytes.
 
-There are practical limits to the size of values (e.g., storing 10-50MB 
objects in HBase would probably be too much to ask); search the mailling list 
for conversations on this topic.
+There are practical limits to the size of values (e.g., storing 10-50MB 
objects in HBase would probably be too much to ask); search the mailing list 
for conversations on this topic.
 All rows in HBase conform to the <<datamodel>>, and that includes versioning.
 Take that into consideration when making your design, as well as block size 
for the ColumnFamily.
 
@@ -502,7 +558,7 @@ ROW                                              COLUMN+CELL
 
 Notice how delete cells are let go.
 
-Now lets run the same test only with `KEEP_DELETED_CELLS` set on the table 
(you can do table or per-column-family):
+Now let's run the same test only with `KEEP_DELETED_CELLS` set on the table 
(you can do table or per-column-family):
 
 [source]
 ----
@@ -593,7 +649,7 @@ However, don't try a full-scan on a large table like this 
from an application (i
 [[secondary.indexes.periodic]]
 ===  Periodic-Update Secondary Index
 
-A secondary index could be created in an other table which is periodically 
updated via a MapReduce job.
+A secondary index could be created in another table which is periodically 
updated via a MapReduce job.
 The job could be executed intra-day, but depending on load-strategy it could 
still potentially be out of sync with the main data table.
 
 See <<mapreduce.example.readwrite,mapreduce.example.readwrite>> for more 
information.
@@ -620,8 +676,13 @@ For more information, see <<coprocessors,coprocessors>>
 == Constraints
 
 HBase currently supports 'constraints' in traditional (SQL) database parlance.
-The advised usage for Constraints is in enforcing business rules for 
attributes in the table (e.g. make sure values are in the range 1-10). 
Constraints could also be used to enforce referential integrity, but this is 
strongly discouraged as it will dramatically decrease the write throughput of 
the tables where integrity checking is enabled.
-Extensive documentation on using Constraints can be found at: 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint[Constraint]
 since version 0.94.
+The advised usage for Constraints is in enforcing business rules for attributes
+in the table (e.g. make sure values are in the range 1-10). Constraints could
+also be used to enforce referential integrity, but this is strongly discouraged
+as it will dramatically decrease the write throughput of the tables where 
integrity
+checking is enabled. Extensive documentation on using Constraints can be found 
at
+link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint]
+since version 0.94.
 
 [[schema.casestudies]]
 == Schema Design Case Studies
@@ -700,7 +761,7 @@ See 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#se
 ====
 
 [[schema.casestudies.log_timeseries.varkeys]]
-==== Variangle Length or Fixed Length Rowkeys?
+==== Variable Length or Fixed Length Rowkeys?
 
 It is critical to remember that rowkeys are stamped on every column in HBase.
 If the hostname is `a` and the event type is `e1` then the resulting rowkey 
would be quite small.
@@ -721,10 +782,12 @@ Composite Rowkey With Numeric Substitution:
 For this approach another lookup table would be needed in addition to 
LOG_DATA, called LOG_TYPES.
 The rowkey of LOG_TYPES would be:
 
-* [type] (e.g., byte indicating hostname vs. event-type)
-* [bytes] variable length bytes for raw hostname or event-type.
+* `[type]` (e.g., byte indicating hostname vs. event-type)
+* `[bytes]` variable length bytes for raw hostname or event-type.
 
-A column for this rowkey could be a long with an assigned number, which could 
be obtained by using an 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29[HBase
 counter].
+A column for this rowkey could be a long with an assigned number, which could 
be obtained
+by using an
++++<a 
href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29";>HBase
 counter</a>+++.
 
 So the resulting composite rowkey would be:
 
@@ -739,7 +802,9 @@ In either the Hash or Numeric substitution approach, the 
raw values for hostname
 
 This effectively is the OpenTSDB approach.
 What OpenTSDB does is re-write data and pack rows into columns for certain 
time-periods.
-For a detailed explanation, see: link:http://opentsdb.net/schema.html, and 
link:http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html[Lessons
 Learned from OpenTSDB] from HBaseCon2012.
+For a detailed explanation, see: http://opentsdb.net/schema.html, and
++++<a 
href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html";>Lessons
 Learned from OpenTSDB</a>+++
+from HBaseCon2012.
 
 But this is how the general concept works: data is ingested, for example, in 
this manner...
 
@@ -784,7 +849,7 @@ Assuming that the combination of customer number and sales 
order uniquely identi
 [customer number][order number]
 ----
 
-for a ORDER table.
+for an ORDER table.
 However, there are more design decisions to make: are the _raw_ values the 
best choices for rowkeys?
 
 The same design questions in the Log Data use-case confront us here.
@@ -842,14 +907,14 @@ The ORDER table's rowkey was described above: 
<<schema.casestudies.custorder,sch
 
 The SHIPPING_LOCATION's composite rowkey would be something like this:
 
-* [order-rowkey]
-* [shipping location number] (e.g., 1st location, 2nd, etc.)
+* `[order-rowkey]`
+* `[shipping location number]` (e.g., 1st location, 2nd, etc.)
 
 The LINE_ITEM table's composite rowkey would be something like this:
 
-* [order-rowkey]
-* [shipping location number] (e.g., 1st location, 2nd, etc.)
-* [line item number] (e.g., 1st lineitem, 2nd, etc.)
+* `[order-rowkey]`
+* `[shipping location number]` (e.g., 1st location, 2nd, etc.)
+* `[line item number]` (e.g., 1st lineitem, 2nd, etc.)
 
 Such a normalized model is likely to be the approach with an RDBMS, but that's 
not your only option with HBase.
 The cons of such an approach is that to retrieve information about any Order, 
you will need:
@@ -867,21 +932,21 @@ With this approach, there would exist a single table 
ORDER that would contain
 
 The Order rowkey was described above: 
<<schema.casestudies.custorder,schema.casestudies.custorder>>
 
-* [order-rowkey]
-* [ORDER record type]
+* `[order-rowkey]`
+* `[ORDER record type]`
 
 The ShippingLocation composite rowkey would be something like this:
 
-* [order-rowkey]
-* [SHIPPING record type]
-* [shipping location number] (e.g., 1st location, 2nd, etc.)
+* `[order-rowkey]`
+* `[SHIPPING record type]`
+* `[shipping location number]` (e.g., 1st location, 2nd, etc.)
 
 The LineItem composite rowkey would be something like this:
 
-* [order-rowkey]
-* [LINE record type]
-* [shipping location number] (e.g., 1st location, 2nd, etc.)
-* [line item number] (e.g., 1st lineitem, 2nd, etc.)
+* `[order-rowkey]`
+* `[LINE record type]`
+* `[shipping location number]` (e.g., 1st location, 2nd, etc.)
+* `[line item number]` (e.g., 1st lineitem, 2nd, etc.)
 
 [[schema.casestudies.custorder.obj.denorm]]
 ===== Denormalized
@@ -890,9 +955,9 @@ A variant of the Single Table With Record Types approach is 
to denormalize and f
 
 The LineItem composite rowkey would be something like this:
 
-* [order-rowkey]
-* [LINE record type]
-* [line item number] (e.g., 1st lineitem, 2nd, etc., care must be taken that 
there are unique across the entire order)
+* `[order-rowkey]`
+* `[LINE record type]`
+* `[line item number]` (e.g., 1st lineitem, 2nd, etc., care must be taken that 
there are unique across the entire order)
 
 and the LineItem columns would be something like this:
 
@@ -915,9 +980,9 @@ For example, the ORDER table's rowkey was described above: 
<<schema.casestudies.
 
 There are many options here: JSON, XML, Java Serialization, Avro, Hadoop 
Writables, etc.
 All of them are variants of the same approach: encode the object graph to a 
byte-array.
-Care should be taken with this approach to ensure backward compatibilty in 
case the object model changes such that older persisted structures can still be 
read back out of HBase.
+Care should be taken with this approach to ensure backward compatibility in 
case the object model changes such that older persisted structures can still be 
read back out of HBase.
 
-Pros are being able to manage complex object graphs with minimal I/O (e.g., a 
single HBase Get per Order in this example), but the cons include the 
aforementioned warning about backward compatiblity of serialization, language 
dependencies of serialization (e.g., Java Serialization only works with Java 
clients), the fact that you have to deserialize the entire object to get any 
piece of information inside the BLOB, and the difficulty in getting frameworks 
like Hive to work with custom objects like this.
+Pros are being able to manage complex object graphs with minimal I/O (e.g., a 
single HBase Get per Order in this example), but the cons include the 
aforementioned warning about backward compatibility of serialization, language 
dependencies of serialization (e.g., Java Serialization only works with Java 
clients), the fact that you have to deserialize the entire object to get any 
piece of information inside the BLOB, and the difficulty in getting frameworks 
like Hive to work with custom objects like this.
 
 [[schema.smackdown]]
 === Case Study - "Tall/Wide/Middle" Schema Design Smackdown
@@ -929,7 +994,7 @@ These are general guidelines and not laws - each 
application must consider its o
 ==== Rows vs. Versions
 
 A common question is whether one should prefer rows or HBase's 
built-in-versioning.
-The context is typically where there are "a lot" of versions of a row to be 
retained (e.g., where it is significantly above the HBase default of 1 max 
versions). The rows-approach would require storing a timestamp in some portion 
of the rowkey so that they would not overwite with each successive update.
+The context is typically where there are "a lot" of versions of a row to be 
retained (e.g., where it is significantly above the HBase default of 1 max 
versions). The rows-approach would require storing a timestamp in some portion 
of the rowkey so that they would not overwrite with each successive update.
 
 Preference: Rows (generally speaking).
 
@@ -1028,14 +1093,14 @@ The tl;dr version is that you should probably go with 
one row per user+value, an
 
 Your two options mirror a common question people have when designing HBase 
schemas: should I go "tall" or "wide"? Your first schema is "tall": each row 
represents one value for one user, and so there are many rows in the table for 
each user; the row key is user + valueid, and there would be (presumably) a 
single column qualifier that means "the value". This is great if you want to 
scan over rows in sorted order by row key (thus my question above, about 
whether these ids are sorted correctly). You can start a scan at any 
user+valueid, read the next 30, and be done.
 What you're giving up is the ability to have transactional guarantees around 
all the rows for one user, but it doesn't sound like you need that.
-Doing it this way is generally recommended (see here 
link:http://hbase.apache.org/book.html#schema.smackdown).
+Doing it this way is generally recommended (see here 
http://hbase.apache.org/book.html#schema.smackdown).
 
 Your second option is "wide": you store a bunch of values in one row, using 
different qualifiers (where the qualifier is the valueid). The simple way to do 
that would be to just store ALL values for one user in a single row.
 I'm guessing you jumped to the "paginated" version because you're assuming 
that storing millions of columns in a single row would be bad for performance, 
which may or may not be true; as long as you're not trying to do too much in a 
single request, or do things like scanning over and returning all of the cells 
in the row, it shouldn't be fundamentally worse.
 The client has methods that allow you to get specific slices of columns.
 
 Note that neither case fundamentally uses more disk space than the other; 
you're just "shifting" part of the identifying information for a value either 
to the left (into the row key, in option one) or to the right (into the column 
qualifiers in option 2). Under the covers, every key/value still stores the 
whole row key, and column family name.
-(If this is a bit confusing, take an hour and watch Lars George's excellent 
video about understanding HBase schema design: 
link:http://www.youtube.com/watch?v=_HLoH_PgrLk).
+(If this is a bit confusing, take an hour and watch Lars George's excellent 
video about understanding HBase schema design: 
http://www.youtube.com/watch?v=_HLoH_PgrLk).
 
 A manually paginated version has lots more complexities, as you note, like 
having to keep track of how many things are in each page, re-shuffling if new 
values are inserted, etc.
 That seems significantly more complex.


http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/security.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/security.adoc 
b/src/main/asciidoc/_chapters/security.adoc
index 101affa..c346435 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -42,7 +42,7 @@ HBase provides mechanisms to secure various components and 
aspects of HBase and
 == Using Secure HTTP (HTTPS) for the Web UI
 
 A default HBase install uses insecure HTTP connections for Web UIs for the 
master and region servers.
-To enable secure HTTP (HTTPS) connections instead, set `hadoop.ssl.enabled` to 
`true` in _hbase-site.xml_.
+To enable secure HTTP (HTTPS) connections instead, set `hbase.ssl.enabled` to 
`true` in _hbase-site.xml_.
 This does not change the port used by the Web UI.
 To change the port for the web UI for a given HBase component, configure that 
port's setting in hbase-site.xml.
 These settings are:
@@ -175,6 +175,15 @@ Add the following to the `hbase-site.xml` file for every 
Thrift gateway:
    You may have  to put the concrete full hostname.
    -->
 </property>
+<!-- Add these if you need to configure a different DNS interface from the 
default -->
+<property>
+  <name>hbase.thrift.dns.interface</name>
+  <value>default</value>
+</property>
+<property>
+  <name>hbase.thrift.dns.nameserver</name>
+  <value>default</value>
+</property>
 ----
 
 Substitute the appropriate credential and keytab for _$USER_ and _$KEYTAB_ 
respectively.
@@ -227,39 +236,41 @@ To enable it, do the following.
 
 <<security.gateway.thrift>> describes how to configure the Thrift gateway to 
authenticate to HBase on the client's behalf, and to access HBase using a proxy 
user. The limitation of this approach is that after the client is initialized 
with a particular set of credentials, it cannot change these credentials during 
the session. The `doAs` feature provides a flexible way to impersonate multiple 
principals using the same client. This feature was implemented in 
link:https://issues.apache.org/jira/browse/HBASE-12640[HBASE-12640] for Thrift 
1, but is currently not available for Thrift 2.
 
-*To allow proxy users*, add the following to the _hbase-site.xml_ file for 
every HBase node:
+*To enable the `doAs` feature*, add the following to the _hbase-site.xml_ file 
for every Thrift gateway:
 
 [source,xml]
 ----
 <property>
-  <name>hadoop.security.authorization</name>
+  <name>hbase.regionserver.thrift.http</name>
   <value>true</value>
 </property>
 <property>
-  <name>hadoop.proxyuser.$USER.groups</name>
-  <value>$GROUPS</value>
-</property>
-<property>
-  <name>hadoop.proxyuser.$USER.hosts</name>
-  <value>$GROUPS</value>
+  <name>hbase.thrift.support.proxyuser</name>
+  <value>true/value>
 </property>
 ----
 
-*To enable the `doAs` feature*, add the following to the _hbase-site.xml_ file 
for every Thrift gateway:
+*To allow proxy users* when using `doAs` impersonation, add the following to 
the _hbase-site.xml_ file for every HBase node:
 
 [source,xml]
 ----
 <property>
-  <name>hbase.regionserver.thrift.http</name>
+  <name>hadoop.security.authorization</name>
   <value>true</value>
 </property>
 <property>
-  <name>hbase.thrift.support.proxyuser</name>
-  <value>true/value>
+  <name>hadoop.proxyuser.$USER.groups</name>
+  <value>$GROUPS</value>
+</property>
+<property>
+  <name>hadoop.proxyuser.$USER.hosts</name>
+  <value>$GROUPS</value>
 </property>
 ----
 
-Take a look at the 
link:https://github.com/apache/hbase/blob/master/hbase-examples/src/main/java/org/apache/hadoop/hbase/thrift/HttpDoAsClient.java[demo
 client] to get an overall idea of how to use this feature in your client.
+Take a look at the
+link:https://github.com/apache/hbase/blob/master/hbase-examples/src/main/java/org/apache/hadoop/hbase/thrift/HttpDoAsClient.java[demo
 client]
+to get an overall idea of how to use this feature in your client.
 
 === Client-side Configuration for Secure Operation - REST Gateway
 
@@ -297,6 +308,10 @@ To enable REST gateway Kerberos authentication for client 
access, add the follow
 [source,xml]
 ----
 <property>
+  <name>hbase.rest.support.proxyuser</name>
+  <value>true</value>
+</property>
+<property>
   <name>hbase.rest.authentication.type</name>
   <value>kerberos</value>
 </property>
@@ -308,12 +323,21 @@ To enable REST gateway Kerberos authentication for client 
access, add the follow
   <name>hbase.rest.authentication.kerberos.keytab</name>
   <value>$KEYTAB</value>
 </property>
+<!-- Add these if you need to configure a different DNS interface from the 
default -->
+<property>
+  <name>hbase.rest.dns.interface</name>
+  <value>default</value>
+</property>
+<property>
+  <name>hbase.rest.dns.nameserver</name>
+  <value>default</value>
+</property>
 ----
 
 Substitute the keytab for HTTP for _$KEYTAB_.
 
 HBase REST gateway supports different 'hbase.rest.authentication.type': 
simple, kerberos.
-You can also implement a custom authentication by implemening Hadoop 
AuthenticationHandler, then specify the full class name as 
'hbase.rest.authentication.type' value.
+You can also implement a custom authentication by implementing Hadoop 
AuthenticationHandler, then specify the full class name as 
'hbase.rest.authentication.type' value.
 For more information, refer to 
link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP 
authentication].
 
 [[security.rest.gateway]]
@@ -325,7 +349,7 @@ To the HBase server, all requests are from the REST gateway 
user.
 The actual users are unknown.
 You can turn on the impersonation support.
 With impersonation, the REST gateway user is a proxy user.
-The HBase server knows the acutal/real user of each request.
+The HBase server knows the actual/real user of each request.
 So it can apply proper authorizations.
 
 To turn on REST gateway impersonation, we need to configure HBase servers 
(masters and region servers) to allow proxy users; configure REST gateway to 
enable impersonation.
@@ -504,21 +528,21 @@ This is future work.
 Secure HBase requires secure ZooKeeper and HDFS so that users cannot access 
and/or modify the metadata and data from under HBase. HBase uses HDFS (or 
configured file system) to keep its data files as well as write ahead logs 
(WALs) and other data. HBase uses ZooKeeper to store some metadata for 
operations (master address, table locks, recovery state, etc).
 
 === Securing ZooKeeper Data
-ZooKeeper has a pluggable authentication mechanism to enable access from 
clients using different methods. ZooKeeper even allows authenticated and 
un-authenticated clients at the same time. The access to znodes can be 
restricted by providing Access Control Lists (ACLs) per znode. An ACL contains 
two components, the authentication method and the principal. ACLs are NOT 
enforced hierarchically. See 
link:https://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#sc_ZooKeeperPluggableAuthentication[ZooKeeper
 Programmers Guide] for details. 
+ZooKeeper has a pluggable authentication mechanism to enable access from 
clients using different methods. ZooKeeper even allows authenticated and 
un-authenticated clients at the same time. The access to znodes can be 
restricted by providing Access Control Lists (ACLs) per znode. An ACL contains 
two components, the authentication method and the principal. ACLs are NOT 
enforced hierarchically. See 
link:https://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#sc_ZooKeeperPluggableAuthentication[ZooKeeper
 Programmers Guide] for details.
 
-HBase daemons authenticate to ZooKeeper via SASL and kerberos (See 
<<zk.sasl.auth>>). HBase sets up the znode ACLs so that only the HBase user and 
the configured hbase superuser (`hbase.superuser`) can access and modify the 
data. In cases where ZooKeeper is used for service discovery or sharing state 
with the client, the znodes created by HBase will also allow anyone (regardless 
of authentication) to read these znodes (clusterId, master address, meta 
location, etc), but only the HBase user can modify them. 
+HBase daemons authenticate to ZooKeeper via SASL and kerberos (See 
<<zk.sasl.auth>>). HBase sets up the znode ACLs so that only the HBase user and 
the configured hbase superuser (`hbase.superuser`) can access and modify the 
data. In cases where ZooKeeper is used for service discovery or sharing state 
with the client, the znodes created by HBase will also allow anyone (regardless 
of authentication) to read these znodes (clusterId, master address, meta 
location, etc), but only the HBase user can modify them.
 
 === Securing File System (HDFS) Data
-All of the data under management is kept under the root directory in the file 
system (`hbase.rootdir`). Access to the data and WAL files in the filesystem 
should be restricted so that users cannot bypass the HBase layer, and peek at 
the underlying data files from the file system. HBase assumes the filesystem 
used (HDFS or other) enforces permissions hierarchically. If sufficient 
protection from the file system (both authorization and authentication) is not 
provided, HBase level authorization control (ACLs, visibility labels, etc) is 
meaningless since the user can always access the data from the file system. 
+All of the data under management is kept under the root directory in the file 
system (`hbase.rootdir`). Access to the data and WAL files in the filesystem 
should be restricted so that users cannot bypass the HBase layer, and peek at 
the underlying data files from the file system. HBase assumes the filesystem 
used (HDFS or other) enforces permissions hierarchically. If sufficient 
protection from the file system (both authorization and authentication) is not 
provided, HBase level authorization control (ACLs, visibility labels, etc) is 
meaningless since the user can always access the data from the file system.
 
 HBase enforces the posix-like permissions 700 (`rwx------`) to its root 
directory. It means that only the HBase user can read or write the files in FS. 
The default setting can be changed by configuring `hbase.rootdir.perms` in 
hbase-site.xml. A restart of the active master is needed so that it changes the 
used permissions. For versions before 1.2.0, you can check whether HBASE-13780 
is committed, and if not, you can manually set the permissions for the root 
directory if needed. Using HDFS, the command would be:
 [source,bash]
 ----
 sudo -u hdfs hadoop fs -chmod 700 /hbase
 ----
-You should change `/hbase` if you are using a different `hbase.rootdir`. 
+You should change `/hbase` if you are using a different `hbase.rootdir`.
 
-In secure mode, SecureBulkLoadEndpoint should be configured and used for 
properly handing of users files created from MR jobs to the HBase daemons and 
HBase user. The staging directory in the distributed file system used for bulk 
load (`hbase.bulkload.staging.dir`, defaults to `/tmp/hbase-staging`) should 
have (mode 711, or `rwx--x--x`) so that users can access the staging directory 
created under that parent directory, but cannot do any other operation. See 
<<hbase.secure.bulkload>> for how to configure SecureBulkLoadEndPoint. 
+In secure mode, SecureBulkLoadEndpoint should be configured and used for 
properly handing of users files created from MR jobs to the HBase daemons and 
HBase user. The staging directory in the distributed file system used for bulk 
load (`hbase.bulkload.staging.dir`, defaults to `/tmp/hbase-staging`) should 
have (mode 711, or `rwx--x--x`) so that users can access the staging directory 
created under that parent directory, but cannot do any other operation. See 
<<hbase.secure.bulkload>> for how to configure SecureBulkLoadEndPoint.
 
 == Securing Access To Your Data
 
@@ -1099,7 +1123,7 @@ NOTE: Visibility labels are not currently applied for 
superusers.
 | Interpretation
 
 | fulltime
-| Allow accesss to users associated with the fulltime label.
+| Allow access to users associated with the fulltime label.
 
 | !public
 | Allow access to users not associated with the public label.
@@ -1314,11 +1338,21 @@ static Table 
createTableAndWriteDataWithLabels(TableName tableName, String... la
 ----
 ====
 
-<<reading_cells_with_labels>>
+[[reading_cells_with_labels]]
 ==== Reading Cells with Labels
-When you issue a Scan or Get, HBase uses your default set of authorizations to 
filter out cells that you do not have access to. A superuser can set the 
default set of authorizations for a given user by using the `set_auths` HBase 
Shell command or the 
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.conf.Configuration,%20java.lang.String\[\],%20java.lang.String)[VisibilityClient.setAuths()]
 method.
 
-You can specify a different authorization during the Scan or Get, by passing 
the AUTHORIZATIONS option in HBase Shell, or the 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()]
 method if you use the API. This authorization will be combined with your 
default set as an additional filter. It will further filter your results, 
rather than giving you additional authorization.
+When you issue a Scan or Get, HBase uses your default set of authorizations to
+filter out cells that you do not have access to. A superuser can set the 
default
+set of authorizations for a given user by using the `set_auths` HBase Shell 
command
+or the
+link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.hbase.client.Connection,%20java.lang.String\[\],%20java.lang.String)[VisibilityClient.setAuths()]
 method.
+
+You can specify a different authorization during the Scan or Get, by passing 
the
+AUTHORIZATIONS option in HBase Shell, or the
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()]
+method if you use the API. This authorization will be combined with your 
default
+set as an additional filter. It will further filter your results, rather than
+giving you additional authorization.
 
 .HBase Shell
 ====
@@ -1564,7 +1598,10 @@ Rotate the Master Key::
 === Secure Bulk Load
 
 Bulk loading in secure mode is a bit more involved than normal setup, since 
the client has to transfer the ownership of the files generated from the 
MapReduce job to HBase.
-Secure bulk loading is implemented by a coprocessor, named 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
 which uses a staging directory configured by the configuration property 
`hbase.bulkload.staging.dir`, which defaults to _/tmp/hbase-staging/_.
+Secure bulk loading is implemented by a coprocessor, named
+link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
+which uses a staging directory configured by the configuration property 
`hbase.bulkload.staging.dir`, which defaults to
+_/tmp/hbase-staging/_.
 
 .Secure Bulk Load Algorithm
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/shell.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/shell.adoc 
b/src/main/asciidoc/_chapters/shell.adoc
index 237089e..a4237fd 100644
--- a/src/main/asciidoc/_chapters/shell.adoc
+++ b/src/main/asciidoc/_chapters/shell.adoc
@@ -76,7 +76,7 @@ NOTE: Spawning HBase Shell commands in this way is slow, so 
keep that in mind wh
 
 .Passing Commands to the HBase Shell
 ====
-You can pass commands to the HBase Shell in non-interactive mode (see 
<<hbasee.shell.noninteractive,hbasee.shell.noninteractive>>) using the `echo` 
command and the `|` (pipe) operator.
+You can pass commands to the HBase Shell in non-interactive mode (see 
<<hbase.shell.noninteractive,hbase.shell.noninteractive>>) using the `echo` 
command and the `|` (pipe) operator.
 Be sure to escape characters in the HBase commands which would otherwise be 
interpreted by the shell.
 Some debug-level output has been truncated from the example below.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/spark.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/spark.adoc 
b/src/main/asciidoc/_chapters/spark.adoc
new file mode 100644
index 0000000..37503e9
--- /dev/null
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -0,0 +1,451 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ . . http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[spark]]
+= HBase and Spark
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+link:http://spark.apache.org/[Apache Spark] is a software framework that is 
used
+to process data in memory in a distributed manner, and is replacing MapReduce 
in
+many use cases.
+
+Spark itself is out of scope of this document, please refer to the Spark site 
for
+more information on the Spark project and subprojects. This document will focus
+on 4 main interaction points between Spark and HBase. Those interaction points 
are:
+
+Basic Spark::
+  The ability to have an HBase Connection at any point in your Spark DAG.
+Spark Streaming::
+  The ability to have an HBase Connection at any point in your Spark Streaming
+  application.
+Spark Bulk Load::
+  The ability to write directly to HBase HFiles for bulk insertion into HBase
+SparkSQL/DataFrames::
+  The ability to write SparkSQL that draws on tables that are represented in 
HBase.
+
+The following sections will walk through examples of all these interaction 
points.
+
+== Basic Spark
+
+This section discusses Spark HBase integration at the lowest and simplest 
levels.
+All the other interaction points are built upon the concepts that will be 
described
+here.
+
+At the root of all Spark and HBase integration is the HBaseContext. The 
HBaseContext
+takes in HBase configurations and pushes them to the Spark executors. This 
allows
+us to have an HBase Connection per Spark Executor in a static location.
+
+For reference, Spark Executors can be on the same nodes as the Region Servers 
or
+on different nodes there is no dependence of co-location. Think of every Spark
+Executor as a multi-threaded client application. This allows any Spark Tasks
+running on the executors to access the shared Connection object.
+
+.HBaseContext Usage Example
+====
+
+This example shows how HBaseContext can be used to do a `foreachPartition` on 
a RDD
+in Scala:
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+...
+
+val hbaseContext = new HBaseContext(sc, config)
+
+rdd.hbaseForeachPartition(hbaseContext, (it, conn) => {
+ val bufferedMutator = conn.getBufferedMutator(TableName.valueOf("t1"))
+ it.foreach((putRecord) => {
+. val put = new Put(putRecord._1)
+. putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, 
putValue._3))
+. bufferedMutator.mutate(put)
+ })
+ bufferedMutator.flush()
+ bufferedMutator.close()
+})
+----
+
+Here is the same example implemented in Java:
+
+[source, java]
+----
+JavaSparkContext jsc = new JavaSparkContext(sparkConf);
+
+try {
+  List<byte[]> list = new ArrayList<>();
+  list.add(Bytes.toBytes("1"));
+  ...
+  list.add(Bytes.toBytes("5"));
+
+  JavaRDD<byte[]> rdd = jsc.parallelize(list);
+  Configuration conf = HBaseConfiguration.create();
+
+  JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
+
+  hbaseContext.foreachPartition(rdd,
+      new VoidFunction<Tuple2<Iterator<byte[]>, Connection>>() {
+   public void call(Tuple2<Iterator<byte[]>, Connection> t)
+        throws Exception {
+    Table table = t._2().getTable(TableName.valueOf(tableName));
+    BufferedMutator mutator = 
t._2().getBufferedMutator(TableName.valueOf(tableName));
+    while (t._1().hasNext()) {
+      byte[] b = t._1().next();
+      Result r = table.get(new Get(b));
+      if (r.getExists()) {
+       mutator.mutate(new Put(b));
+      }
+    }
+
+    mutator.flush();
+    mutator.close();
+    table.close();
+   }
+  });
+} finally {
+  jsc.stop();
+}
+----
+====
+
+All functionality between Spark and HBase will be supported both in Scala and 
in
+Java, with the exception of SparkSQL which will support any language that is
+supported by Spark. For the remaining of this documentation we will focus on
+Scala examples for now.
+
+The examples above illustrate how to do a foreachPartition with a connection. A
+number of other Spark base functions  are supported out of the box:
+
+// tag::spark_base_functions[]
+`bulkPut`:: For massively parallel sending of puts to HBase
+`bulkDelete`:: For massively parallel sending of deletes to HBase
+`bulkGet`:: For massively parallel sending of gets to HBase to create a new RDD
+`mapPartition`:: To do a Spark Map function with a Connection object to allow 
full
+access to HBase
+`hBaseRDD`:: To simplify a distributed scan to create a RDD
+// end::spark_base_functions[]
+
+For examples of all these functionalities, see the HBase-Spark Module.
+
+== Spark Streaming
+http://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream
+processing framework built on top of Spark. HBase and Spark Streaming make 
great
+companions in that HBase can help serve the following benefits alongside Spark
+Streaming.
+
+* A place to grab reference data or profile data on the fly
+* A place to store counts or aggregates in a way that supports Spark Streaming
+promise of _only once processing_.
+
+The HBase-Spark moduleâs integration points with Spark Streaming are similar 
to
+its normal Spark integration points, in that the following commands are 
possible
+straight off a Spark Streaming DStream.
+
+include::spark.adoc[tags=spark_base_functions]
+
+.`bulkPut` Example with DStreams
+====
+
+Below is an example of bulkPut with DStreams. It is very close in feel to the 
RDD
+bulk put.
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+val ssc = new StreamingContext(sc, Milliseconds(200))
+
+val rdd1 = ...
+val rdd2 = ...
+
+val queue = mutable.Queue[RDD[(Array[Byte], Array[(Array[Byte],
+    Array[Byte], Array[Byte])])]]()
+
+queue += rdd1
+queue += rdd2
+
+val dStream = ssc.queueStream(queue)
+
+dStream.hbaseBulkPut(
+  hbaseContext,
+  TableName.valueOf(tableName),
+  (putRecord) => {
+   val put = new Put(putRecord._1)
+   putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, 
putValue._3))
+   put
+  })
+----
+
+There are three inputs to the `hbaseBulkPut` function.
+. The hbaseContext that carries the configuration boardcast information link us
+to the HBase Connections in the executors
+. The table name of the table we are putting data into
+. A function that will convert a record in the DStream into an HBase Put 
object.
+====
+
+== Bulk Load
+
+Spark bulk load follows very closely to the MapReduce implementation of bulk
+load. In short, a partitioner partitions based on region splits and
+the row keys are sent to the reducers in order, so that HFiles can be written
+out. In Spark terms, the bulk load will be focused around a
+`repartitionAndSortWithinPartitions` followed by a `foreachPartition`.
+
+The only major difference with the Spark implementation compared to the
+MapReduce implementation is that the column qualifier is included in the 
shuffle
+ordering process. This was done because the MapReduce bulk load implementation
+would have memory issues with loading rows with a large numbers of columns, as 
a
+result of the sorting of those columns being done in the memory of the reducer 
JVM.
+Instead, that ordering is done in the Spark Shuffle, so there should no longer
+be a limit to the number of columns in a row for bulk loading.
+
+.Bulk Loading Example
+====
+
+The following example shows bulk loading with Spark.
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+
+rdd.hbaseBulkLoad(TableName.valueOf(tableName),
+  t => {
+   val rowKey = t._1
+   val family:Array[Byte] = t._2(0)._1
+   val qualifier = t._2(0)._2
+   val value = t._2(0)._3
+
+   val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier)
+
+   Seq((keyFamilyQualifier, value)).iterator
+  },
+  stagingFolder.getPath)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+The `hbaseBulkLoad` function takes three required parameters:
+
+. The table name of the table we intend to bulk load too
+
+. A function that will convert a record in the RDD to a tuple key value par. 
With
+the tuple key being a KeyFamilyQualifer object and the value being the cell 
value.
+The KeyFamilyQualifer object will hold the RowKey, Column Family, and Column 
Qualifier.
+The shuffle will partition on the RowKey but will sort by all three values.
+
+. The temporary path for the HFile to be written out too
+
+Following the Spark bulk load command,  use the HBase's LoadIncrementalHFiles 
object
+to load the newly created HFiles into HBase.
+
+.Additional Parameters for Bulk Loading with Spark
+
+You can set the following attributes with additional parameter options on 
hbaseBulkLoad.
+
+* Max file size of the HFiles
+* A flag to exclude HFiles from compactions
+* Column Family settings for compression, bloomType, blockSize, and 
dataBlockEncoding
+
+.Using Additional Parameters
+====
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+
+val familyHBaseWriterOptions = new java.util.HashMap[Array[Byte], 
FamilyHFileWriteOptions]
+val f1Options = new FamilyHFileWriteOptions("GZ", "ROW", 128, "PREFIX")
+
+familyHBaseWriterOptions.put(Bytes.toBytes("columnFamily1"), f1Options)
+
+rdd.hbaseBulkLoad(TableName.valueOf(tableName),
+  t => {
+   val rowKey = t._1
+   val family:Array[Byte] = t._2(0)._1
+   val qualifier = t._2(0)._2
+   val value = t._2(0)._3
+
+   val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier)
+
+   Seq((keyFamilyQualifier, value)).iterator
+  },
+  stagingFolder.getPath,
+  familyHBaseWriterOptions,
+  compactionExclude = false,
+  HConstants.DEFAULT_MAX_FILE_SIZE)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+== SparkSQL/DataFrames
+
+http://spark.apache.org/sql/[SparkSQL] is a subproject of Spark that supports
+SQL that will compute down to a Spark DAG. In addition,SparkSQL is a heavy user
+of DataFrames. DataFrames are like RDDs with schema information.
+
+The HBase-Spark module includes support for Spark SQL and DataFrames, which 
allows
+you to write SparkSQL directly on HBase tables. In addition the HBase-Spark
+will push down query filtering logic to HBase.
+
+=== Predicate Push Down
+
+There are two examples of predicate push down in the HBase-Spark 
implementation.
+The first example shows the push down of filtering logic on the RowKey. 
HBase-Spark
+will reduce the filters on RowKeys down to a set of Get and/or Scan commands.
+
+NOTE: The Scans are distributed scans, rather than a single client scan 
operation.
+
+If the query looks something like the following, the logic will push down and 
get
+the rows through 3 Gets and 0 Scans. We can do gets because all the operations
+are `equal` operations.
+
+[source,sql]
+----
+SELECT
+  KEY_FIELD,
+  B_FIELD,
+  A_FIELD
+FROM hbaseTmp
+WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
+----
+
+Now let's look at an example where we will end up doing two scans on HBase.
+
+[source, sql]
+----
+SELECT
+  KEY_FIELD,
+  B_FIELD,
+  A_FIELD
+FROM hbaseTmp
+WHERE KEY_FIELD < 'get2' or KEY_FIELD > 'get3'
+----
+
+In this example we will get 0 Gets and 2 Scans. One scan will load everything
+from the first row in the table until âget2â and the second scan will get
+everything from âget3â until the last row in the table.
+
+The next query is a good example of having a good deal of range checks. However
+the ranges overlap. To the code will be smart enough to get the following data
+in a single scan that encompasses all the data asked by the query.
+
+[source, sql]
+----
+SELECT
+  KEY_FIELD,
+  B_FIELD,
+  A_FIELD
+FROM hbaseTmp
+WHERE
+  (KEY_FIELD >= 'get1' and KEY_FIELD <= 'get3') or
+  (KEY_FIELD > 'get3' and KEY_FIELD <= 'get5')
+----
+
+The second example of push down functionality offered by the HBase-Spark module
+is the ability to push down filter logic for column and cell fields. Just like
+the RowKey logic, all query logic will be consolidated into the minimum number
+of range checks and equal checks by sending a Filter object along with the Scan
+with information about consolidated push down predicates
+
+.SparkSQL Code Example
+====
+This example shows how we can interact with HBase with SQL.
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+new HBaseContext(sc, TEST_UTIL.getConfiguration)
+val sqlContext = new SQLContext(sc)
+
+df = sqlContext.load("org.apache.hadoop.hbase.spark",
+  Map("hbase.columns.mapping" ->
+   "KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b",
+   "hbase.table" -> "t1"))
+
+df.registerTempTable("hbaseTmp")
+
+val results = sqlContext.sql("SELECT KEY_FIELD, B_FIELD FROM hbaseTmp " +
+  "WHERE " +
+  "(KEY_FIELD = 'get1' and B_FIELD < '3') or " +
+  "(KEY_FIELD >= 'get3' and B_FIELD = '8')").take(5)
+----
+
+There are three major parts of this example that deserve explaining.
+
+The sqlContext.load function::
+  In the sqlContext.load function we see two
+  parameters. The first of these parameters is pointing Spark to the HBase
+  DefaultSource class that will act as the interface between SparkSQL and 
HBase.
+
+A map of key value pairs::
+  In this example we have two keys in our map, `hbase.columns.mapping` and
+  `hbase.table`. The `hbase.table` directs SparkSQL to use the given HBase 
table.
+  The `hbase.columns.mapping` key give us the logic to translate HBase columns 
to
+  SparkSQL columns.
++
+The `hbase.columns.mapping` is a string that follows the following format
++
+[source, scala]
+----
+(SparkSQL.ColumnName) (SparkSQL.ColumnType) 
(HBase.ColumnFamily):(HBase.Qualifier)
+----
++
+In the example below we see the definition of three fields. Because KEY_FIELD 
has
+no ColumnFamily, it is the RowKey.
++
+----
+KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b
+----
+
+The registerTempTable function::
+  This is a SparkSQL function that allows us now to be free of Scala when 
accessing
+  our HBase table directly with SQL with the table name of "hbaseTmp".
+
+The last major point to note in the example is the `sqlContext.sql` function, 
which
+allows the user to ask their questions in SQL which will be pushed down to the
+DefaultSource code in the HBase-Spark module. The result of this command will 
be
+a DataFrame with the Schema of KEY_FIELD and B_FIELD.
+====
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/thrift_filter_language.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/thrift_filter_language.adoc 
b/src/main/asciidoc/_chapters/thrift_filter_language.adoc
index 744cec6..da36cea 100644
--- a/src/main/asciidoc/_chapters/thrift_filter_language.adoc
+++ b/src/main/asciidoc/_chapters/thrift_filter_language.adoc
@@ -31,7 +31,6 @@
 Apache link:http://thrift.apache.org/[Thrift] is a cross-platform, 
cross-language development framework.
 HBase includes a Thrift API and filter language.
 The Thrift API relies on client and server processes.
-Documentation about the HBase Thrift API is located at 
http://wiki.apache.org/hadoop/Hbase/ThriftApi.
 
 You can configure Thrift for secure authentication at the server and client 
side, by following the procedures in <<security.client.thrift>> and 
<<security.gateway.thrift>>.
 
@@ -250,7 +249,7 @@ RowFilter::
 
 Family Filter::
   This filter takes a compare operator and a comparator.
-  It compares each qualifier name with the comparator using the compare 
operator and if the comparison returns true, it returns all the key-values in 
that column.
+  It compares each column family name with the comparator using the compare 
operator and if the comparison returns true, it returns all the Cells in that 
column family.
 
 QualifierFilter::
   This filter takes a compare operator and a comparator.

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/tracing.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/tracing.adoc 
b/src/main/asciidoc/_chapters/tracing.adoc
index 6bb8065..0cddd8a 100644
--- a/src/main/asciidoc/_chapters/tracing.adoc
+++ b/src/main/asciidoc/_chapters/tracing.adoc
@@ -30,13 +30,13 @@
 :icons: font
 :experimental:
 
-link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added 
support for tracing requests through HBase, using the open source tracing 
library, link:http://github.com/cloudera/htrace[HTrace].
-Setting up tracing is quite simple, however it currently requires some very 
minor changes to your client code (it would not be very difficult to remove 
this requirement). 
+link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added 
support for tracing requests through HBase, using the open source tracing 
library, link:http://htrace.incubator.apache.org/[HTrace].
+Setting up tracing is quite simple, however it currently requires some very 
minor changes to your client code (it would not be very difficult to remove 
this requirement).
 
 [[tracing.spanreceivers]]
 === SpanReceivers
 
-The tracing system works by collecting information in structs called 'Spans'. 
It is up to you to choose how you want to receive this information by 
implementing the `SpanReceiver` interface, which defines one method: 
+The tracing system works by collecting information in structures called 
'Spans'. It is up to you to choose how you want to receive this information by 
implementing the `SpanReceiver` interface, which defines one method:
 
 [source]
 ----
@@ -45,68 +45,55 @@ public void receiveSpan(Span span);
 ----
 
 This method serves as a callback whenever a span is completed.
-HTrace allows you to use as many SpanReceivers as you want so you can easily 
send trace information to multiple destinations. 
+HTrace allows you to use as many SpanReceivers as you want so you can easily 
send trace information to multiple destinations.
 
-Configure what SpanReceivers you'd like to us by putting a comma separated 
list of the fully-qualified class name of classes implementing `SpanReceiver` 
in _hbase-site.xml_ property: `hbase.trace.spanreceiver.classes`. 
+Configure what SpanReceivers you'd like to us by putting a comma separated 
list of the fully-qualified class name of classes implementing `SpanReceiver` 
in _hbase-site.xml_ property: `hbase.trace.spanreceiver.classes`.
 
 HTrace includes a `LocalFileSpanReceiver` that writes all span information to 
local files in a JSON-based format.
-The `LocalFileSpanReceiver` looks in _hbase-site.xml_      for a 
`hbase.local-file-span-receiver.path` property with a value describing the name 
of the file to which nodes should write their span information. 
+The `LocalFileSpanReceiver` looks in _hbase-site.xml_      for a 
`hbase.local-file-span-receiver.path` property with a value describing the name 
of the file to which nodes should write their span information.
 
 [source]
 ----
 
 <property>
   <name>hbase.trace.spanreceiver.classes</name>
-  <value>org.htrace.impl.LocalFileSpanReceiver</value>
+  <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
 </property>
 <property>
-  <name>hbase.local-file-span-receiver.path</name>
+  <name>hbase.htrace.local-file-span-receiver.path</name>
   <value>/var/log/hbase/htrace.out</value>
 </property>
 ----
 
-HTrace also provides `ZipkinSpanReceiver` which converts spans to 
link:http://github.com/twitter/zipkin[Zipkin] span format and send them to 
Zipkin server.
-In order to use this span receiver, you need to install the jar of 
htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. 
+HTrace also provides `ZipkinSpanReceiver` which converts spans to 
link:http://github.com/twitter/zipkin[Zipkin] span format and send them to 
Zipkin server. In order to use this span receiver, you need to install the jar 
of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster.
 
-_htrace-zipkin_ is published to the maven central repository.
-You could get the latest version from there or just build it locally and then 
copy it out to all nodes, change your config to use zipkin receiver, distribute 
the new configuration and then (rolling) restart. 
+_htrace-zipkin_ is published to the 
link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven
 central repository]. You could get the latest version from there or just build 
it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage 
for information on how to do this) and then copy it out to all nodes.
 
-Here is the example of manual setup procedure. 
-
-----
-
-$ git clone https://github.com/cloudera/htrace
-$ cd htrace/htrace-zipkin
-$ mvn compile assembly:single
-$ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HBASE_HOME/lib/
-  # copy jar to all nodes...
-----
-
-The `ZipkinSpanReceiver` looks in _hbase-site.xml_      for a 
`hbase.zipkin.collector-hostname` and `hbase.zipkin.collector-port` property 
with a value describing the Zipkin collector server to which span information 
are sent. 
+`ZipkinSpanReceiver` for properties called 
`hbase.htrace.zipkin.collector-hostname` and 
`hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing 
the Zipkin collector server to which span information are sent.
 
 [source,xml]
 ----
 
 <property>
   <name>hbase.trace.spanreceiver.classes</name>
-  <value>org.htrace.impl.ZipkinSpanReceiver</value>
-</property> 
+  <value>org.apache.htrace.impl.ZipkinSpanReceiver</value>
+</property>
 <property>
-  <name>hbase.zipkin.collector-hostname</name>
+  <name>hbase.htrace.zipkin.collector-hostname</name>
   <value>localhost</value>
-</property> 
+</property>
 <property>
-  <name>hbase.zipkin.collector-port</name>
+  <name>hbase.htrace.zipkin.collector-port</name>
   <value>9410</value>
 </property>
 ----
 
-If you do not want to use the included span receivers, you are encouraged to 
write your own receiver (take a look at `LocalFileSpanReceiver` for an 
example). If you think others would benefit from your receiver, file a JIRA or 
send a pull request to link:http://github.com/cloudera/htrace[HTrace]. 
+If you do not want to use the included span receivers, you are encouraged to 
write your own receiver (take a look at `LocalFileSpanReceiver` for an 
example). If you think others would benefit from your receiver, file a JIRA 
with the HTrace project.
 
 [[tracing.client.modifications]]
 == Client Modifications
 
-In order to turn on tracing in your client code, you must initialize the 
module sending spans to receiver once per client process. 
+In order to turn on tracing in your client code, you must initialize the 
module sending spans to receiver once per client process.
 
 [source,java]
 ----
@@ -120,7 +107,7 @@ private SpanReceiverHost spanReceiverHost;
 ----
 
 Then you simply start tracing span before requests you think are interesting, 
and close it when the request is done.
-For example, if you wanted to trace all of your get operations, you change 
this: 
+For example, if you wanted to trace all of your get operations, you change 
this:
 
 [source,java]
 ----
@@ -131,7 +118,7 @@ Get get = new Get(Bytes.toBytes("r1"));
 Result res = table.get(get);
 ----
 
-into: 
+into:
 
 [source,java]
 ----
@@ -146,7 +133,7 @@ try {
 }
 ----
 
-If you wanted to trace half of your 'get' operations, you would pass in: 
+If you wanted to trace half of your 'get' operations, you would pass in:
 
 [source,java]
 ----
@@ -155,13 +142,12 @@ new ProbabilitySampler(0.5)
 ----
 
 in lieu of `Sampler.ALWAYS` to `Trace.startSpan()`.
-See the HTrace _README_ for more information on Samplers. 
+See the HTrace _README_ for more information on Samplers.
 
 [[tracing.client.shell]]
 == Tracing from HBase Shell
 
-You can use +trace+ command for tracing requests from HBase Shell. +trace 
'start'+ command turns on tracing and +trace
-        'stop'+ command turns off tracing. 
+You can use `trace` command for tracing requests from HBase Shell. `trace 
'start'` command turns on tracing and `trace 'stop'` command turns off tracing.
 
 [source]
 ----
@@ -171,9 +157,8 @@ hbase(main):002:0> put 'test', 'row1', 'f:', 'val1'   # 
traced commands
 hbase(main):003:0> trace 'stop'
 ----
 
-+trace 'start'+ and +trace 'stop'+ always returns boolean value representing 
if or not there is ongoing tracing.
-As a result, +trace
-        'stop'+ returns false on suceess. +trace 'status'+ just returns if or 
not tracing is turned on. 
+`trace 'start'` and `trace 'stop'` always returns boolean value representing 
if or not there is ongoing tracing.
+As a result, `trace 'stop'` returns false on success. `trace 'status'` just 
returns if or not tracing is turned on.
 
 [source]
 ----

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc 
b/src/main/asciidoc/_chapters/troubleshooting.adoc
index 1776c9e..e372760 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -89,11 +89,11 @@ Additionally, each DataNode server will also have a 
TaskTracker/NodeManager log
 [[rpc.logging]]
 ==== Enabling RPC-level logging
 
-Enabling the RPC-level logging on a RegionServer can often given insight on 
timings at the server.
+Enabling the RPC-level logging on a RegionServer can often give insight on 
timings at the server.
 Once enabled, the amount of log spewed is voluminous.
 It is not recommended that you leave this logging on for more than short 
bursts of time.
 To enable RPC-level logging, browse to the RegionServer UI and click on _Log 
Level_.
-Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (Thats 
right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
+Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (That's 
right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
 Analyze.
 
 To disable, set the logging level back to `INFO` level.
@@ -185,7 +185,7 @@ The key points here is to keep all these pauses low.
 CMS pauses are always low, but if your ParNew starts growing, you can see 
minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.
 
 This can be due to the size of the ParNew, which should be relatively small.
-If your ParNew is very large after running HBase for a while, in one example a 
ParNew was about 150MB, then you might have to constrain the size of ParNew 
(The larger it is, the longer the collections take but if its too small, 
objects are promoted to old gen too quickly). In the below we constrain new gen 
size to 64m.
+If your ParNew is very large after running HBase for a while, in one example a 
ParNew was about 150MB, then you might have to constrain the size of ParNew 
(The larger it is, the longer the collections take but if it's too small, 
objects are promoted to old gen too quickly). In the below we constrain new gen 
size to 64m.
 
 Add the below line in _hbase-env.sh_:
 [source,bourne]
@@ -443,7 +443,7 @@ java.lang.Thread.State: WAITING (on object monitor)
     at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
 ----
 
-A handler thread that's waiting for stuff to do (like put, delete, scan, etc):
+A handler thread that's waiting for stuff to do (like put, delete, scan, etc.):
 
 [source]
 ----
@@ -559,6 +559,14 @@ You can also tail all the logs at the same time, edit 
files, etc.
 
 For more information on the HBase client, see <<client,client>>.
 
+=== Missed Scan Results Due To Mismatch Of 
`hbase.client.scanner.max.result.size` Between Client and Server
+If either the client or server version is lower than 0.98.11/1.0.0 and the 
server
+has a smaller value for `hbase.client.scanner.max.result.size` than the 
client, scan
+requests that reach the server's `hbase.client.scanner.max.result.size` are 
likely
+to miss data. In particular, 0.98.11 defaults 
`hbase.client.scanner.max.result.size`
+to 2 MB but other versions default to larger values. For this reason, be very 
careful
+using 0.98.11 servers with any other client version.
+
 [[trouble.client.scantimeout]]
 === ScannerTimeoutException or UnknownScannerException
 
@@ -834,6 +842,31 @@ Two common use-cases for querying HDFS for HBase objects 
is research the degree
 If there are a large number of StoreFiles for each ColumnFamily it could 
indicate the need for a major compaction.
 Additionally, after a major compaction if the resulting StoreFile is "small" 
it could indicate the need for a reduction of ColumnFamilies for the table.
 
+=== Unexpected Filesystem Growth
+
+If you see an unexpected spike in filesystem usage by HBase, two possible 
culprits
+are snapshots and WALs.
+
+Snapshots::
+  When you create a snapshot, HBase retains everything it needs to recreate 
the table's
+  state at that time of the snapshot. This includes deleted cells or expired 
versions.
+  For this reason, your snapshot usage pattern should be well-planned, and you 
should
+  prune snapshots that you no longer need. Snapshots are stored in 
`/hbase/.snapshots`,
+  and archives needed to restore snapshots are stored in
+  `/hbase/.archive/<_tablename_>/<_region_>/<_column_family_>/`.
+
+  *Do not* manage snapshots or archives manually via HDFS. HBase provides APIs 
and
+  HBase Shell commands for managing them. For more information, see 
<<ops.snapshots>>.
+
+WAL::
+  Write-ahead logs (WALs) are stored in subdirectories of `/hbase/.logs/`, 
depending
+  on their status. Already-processed WALs are stored in 
`/hbase/.logs/oldWALs/` and
+  corrupt WALs are stored in `/hbase/.logs/.corrupt/` for examination.
+  If the size of any subdirectory of `/hbase/.logs/` is growing, examine the 
HBase
+  server logs to find the root cause for why WALs are not being processed 
correctly.
+
+*Do not* manage WALs manually via HDFS.
+
 [[trouble.network]]
 == Network
 
@@ -1037,7 +1070,7 @@ However, if the NotServingRegionException is logged 
ERROR, then the client ran o
 
 Fix your DNS.
 In versions of Apache HBase before 0.92.x, reverse DNS needs to give same 
answer as forward lookup.
-See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 
RegionServer is not using the name given it by the master; double entry in 
master listing of servers] for gorey details.
+See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 
RegionServer is not using the name given it by the master; double entry in 
master listing of servers] for gory details.
 
 [[brand.new.compressor]]
 ==== Logs flooded with '2011-01-10 12:40:48,407 INFO 
org.apache.hadoop.io.compress.CodecPool: Gotbrand-new compressor' messages

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/unit_testing.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/unit_testing.adoc 
b/src/main/asciidoc/_chapters/unit_testing.adoc
index 3f70001..6f13864 100644
--- a/src/main/asciidoc/_chapters/unit_testing.adoc
+++ b/src/main/asciidoc/_chapters/unit_testing.adoc
@@ -47,7 +47,7 @@ public class MyHBaseDAO {
         Put put = createPut(obj);
         table.put(put);
     }
-    
+
     private static Put createPut(HBaseTestObj obj) {
         Put put = new Put(Bytes.toBytes(obj.getRowKey()));
         put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"),
@@ -96,13 +96,13 @@ public class TestMyHbaseDAOData {
 
 These tests ensure that your `createPut` method creates, populates, and 
returns a `Put` object with expected values.
 Of course, JUnit can do much more than this.
-For an introduction to JUnit, see 
link:https://github.com/junit-team/junit/wiki/Getting-started. 
+For an introduction to JUnit, see 
https://github.com/junit-team/junit/wiki/Getting-started.
 
 == Mockito
 
 Mockito is a mocking framework.
 It goes further than JUnit by allowing you to test the interactions between 
objects without having to replicate the entire environment.
-You can read more about Mockito at its project site, 
link:https://code.google.com/p/mockito/.
+You can read more about Mockito at its project site, 
https://code.google.com/p/mockito/.
 
 You can use Mockito to do unit testing on smaller units.
 For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a 
`org.apache.hadoop.hbase.master.MasterServices` interface reference rather than 
a full-blown `org.apache.hadoop.hbase.master.HMaster`.
@@ -133,7 +133,7 @@ public class TestMyHBaseDAO{
   Configuration config = HBaseConfiguration.create();
   @Mock
   Connection connection = ConnectionFactory.createConnection(config);
-  @Mock 
+  @Mock
   private Table table;
   @Captor
   private ArgumentCaptor putCaptor;
@@ -150,7 +150,7 @@ public class TestMyHBaseDAO{
     MyHBaseDAO.insertRecord(table, obj);
     verify(table).put(putCaptor.capture());
     Put put = putCaptor.getValue();
-  
+
     assertEquals(Bytes.toString(put.getRow()), obj.getRowKey());
     assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")));
     assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")));
@@ -182,7 +182,7 @@ public class MyReducer extends TableReducer<Text, Text, 
ImmutableBytesWritable>
    public static final byte[] CF = "CF".getBytes();
    public static final byte[] QUALIFIER = "CQ-1".getBytes();
    public void reduce(Text key, Iterable<Text> values, Context context) throws 
IOException, InterruptedException {
-     //bunch of processing to extract data to be inserted, in our case, lets 
say we are simply
+     //bunch of processing to extract data to be inserted, in our case, let's 
say we are simply
      //appending all the records we receive from the mapper for this particular
      //key and insert one record into HBase
      StringBuffer data = new StringBuffer();
@@ -197,7 +197,7 @@ public class MyReducer extends TableReducer<Text, Text, 
ImmutableBytesWritable>
  }
 ----
 
-To test this code, the first step is to add a dependency to MRUnit to your 
Maven POM file. 
+To test this code, the first step is to add a dependency to MRUnit to your 
Maven POM file.
 
 [source,xml]
 ----
@@ -225,16 +225,16 @@ public class MyReducerTest {
       MyReducer reducer = new MyReducer();
       reduceDriver = ReduceDriver.newReduceDriver(reducer);
     }
-  
+
    @Test
    public void testHBaseInsert() throws IOException {
-      String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1", 
+      String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1",
 strValue2 = "DATA2";
       List<Text> list = new ArrayList<Text>();
       list.add(new Text(strValue));
       list.add(new Text(strValue1));
       list.add(new Text(strValue2));
-      //since in our case all that the reducer is doing is appending the 
records that the mapper   
+      //since in our case all that the reducer is doing is appending the 
records that the mapper
       //sends it, we should get the following back
       String expectedOutput = strValue + strValue1 + strValue2;
      //Setup Input, mimic what mapper would have passed
@@ -242,10 +242,10 @@ strValue2 = "DATA2";
       reduceDriver.withInput(new Text(strKey), list);
       //run the reducer and get its output
       List<Pair<ImmutableBytesWritable, Writable>> result = reduceDriver.run();
-    
+
       //extract key from result and verify
       assertEquals(Bytes.toString(result.get(0).getFirst().get()), strKey);
-    
+
       //extract value for CF/QUALIFIER and verify
       Put a = (Put)result.get(0).getSecond();
       String c = Bytes.toString(a.get(CF, QUALIFIER).get(0).getValue());
@@ -259,7 +259,7 @@ Your MRUnit test verifies that the output is as expected, 
the Put that is insert
 
 MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to 
test other operations, including reading from HBase, processing data, or 
writing to HDFS,
 
-== Integration Testing with a HBase Mini-Cluster
+== Integration Testing with an HBase Mini-Cluster
 
 HBase ships with HBaseTestingUtility, which makes it easy to write integration 
tests using a [firstterm]_mini-cluster_.
 The first step is to add some dependencies to your Maven POM file.
@@ -283,7 +283,7 @@ Check the versions to be sure they are appropriate.
     <type>test-jar</type>
     <scope>test</scope>
 </dependency>
-        
+
 <dependency>
     <groupId>org.apache.hadoop</groupId>
     <artifactId>hadoop-hdfs</artifactId>
@@ -309,7 +309,7 @@ public class MyHBaseIntegrationTest {
     private static HBaseTestingUtility utility;
     byte[] CF = "CF".getBytes();
     byte[] QUALIFIER = "CQ-1".getBytes();
-    
+
     @Before
     public void setup() throws Exception {
        utility = new HBaseTestingUtility();
@@ -343,7 +343,7 @@ This code creates an HBase mini-cluster and starts it.
 Next, it creates a table called `MyTest` with one column family, `CF`.
 A record is inserted, a Get is performed from the same table, and the 
insertion is verified.
 
-NOTE: Starting the mini-cluster takes about 20-30 seconds, but that should be 
appropriate for integration testing. 
+NOTE: Starting the mini-cluster takes about 20-30 seconds, but that should be 
appropriate for integration testing.
 
 To use an HBase mini-cluster on Microsoft Windows, you need to use a Cygwin 
environment.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/6f07973d/src/main/asciidoc/_chapters/upgrading.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/upgrading.adoc 
b/src/main/asciidoc/_chapters/upgrading.adoc
index 6b63833..6327c5a 100644
--- a/src/main/asciidoc/_chapters/upgrading.adoc
+++ b/src/main/asciidoc/_chapters/upgrading.adoc
@@ -92,7 +92,7 @@ In addition to the usual API versioning considerations HBase 
has other compatibi
 .Operational Compatibility
 * Metric changes
 * Behavioral changes of services
-* Web page APIs
+* JMX APIs exposed via the `/jmx/` endpoint
 
 .Summary
 * A patch upgrade is a drop-in replacement. Any change that is not Java binary 
compatible would not be allowed.footnote:[See 
http://docs.oracle.com/javase/specs/jls/se7/html/jls-13.html.]. Downgrading 
versions within patch releases may not be compatible.
@@ -132,7 +132,7 @@ HBase Client API::
 
 [[hbase.limitetprivate.api]]
 HBase LimitedPrivate API::
-  LimitedPrivate annotation comes with a set of target consumers for the 
interfaces. Those consumers are coprocessors, phoenix, replication endpoint 
implemnetations or similar. At this point, HBase only guarantees source and 
binary compatibility for these interfaces between patch versions.
+  LimitedPrivate annotation comes with a set of target consumers for the 
interfaces. Those consumers are coprocessors, phoenix, replication endpoint 
implementations or similar. At this point, HBase only guarantees source and 
binary compatibility for these interfaces between patch versions.
 
 [[hbase.private.api]]
 HBase Private API::
@@ -158,7 +158,7 @@ When we say two HBase versions are compatible, we mean that 
the versions are wir
 
 A rolling upgrade is the process by which you update the servers in your 
cluster a server at a time. You can rolling upgrade across HBase versions if 
they are binary or wire compatible. See <<hbase.rolling.restart>> for more on 
what this means. Coarsely, a rolling upgrade is a graceful stop each server, 
update the software, and then restart. You do this for each server in the 
cluster. Usually you upgrade the Master first and then the RegionServers. See 
<<rolling>> for tools that can help use the rolling upgrade process.
 
-For example, in the below, HBase was symlinked to the actual HBase install. On 
upgrade, before running a rolling restart over the cluser, we changed the 
symlink to point at the new HBase software version and then ran
+For example, in the below, HBase was symlinked to the actual HBase install. On 
upgrade, before running a rolling restart over the cluster, we changed the 
symlink to point at the new HBase software version and then ran
 
 [source,bash]
 ----
@@ -192,9 +192,15 @@ See <<zookeeper.requirements>>.
 .HBase Default Ports Changed
 The ports used by HBase changed. They used to be in the 600XX range. In HBase 
1.0.0 they have been moved up out of the ephemeral port range and are 160XX 
instead (Master web UI was 60010 and is now 16010; the RegionServer web UI was 
60030 and is now 16030, etc.). If you want to keep the old port locations, copy 
the port setting configs from _hbase-default.xml_ into _hbase-site.xml_, change 
them back to the old values from the HBase 0.98.x era, and ensure you've 
distributed your configurations before you restart.
 
+.HBase Master Port Binding Change
+In HBase 1.0.x, the HBase Master binds the RegionServer ports as well as the 
Master
+ports. This behavior is changed from HBase versions prior to 1.0. In HBase 1.1 
and 2.0 branches,
+this behavior is reverted to the pre-1.0 behavior of the HBase master not 
binding the RegionServer
+ports.
+
 [[upgrade1.0.hbase.bucketcache.percentage.in.combinedcache]]
 .hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED
-You may have made use of this configuration if you are using BucketCache. If 
NOT using BucketCache, this change does not effect you. Its removal means that 
your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the 
way you would size the on-heap L1 LruBlockCache if you were NOT doing 
BucketCache -- and the BucketCache size is not whatever the setting for 
`hbase.bucketcache.size` is. You may need to adjust configs to get the 
LruBlockCache and BucketCache sizes set to what they were in 0.98.x and 
previous. If you did not set this config., its default value was 0.9. If you do 
nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache 
will become `hfile.block.cache.size` times your java heap size 
(`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see 
link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify 
offheap cache config by removing the confusing 
"hbase.bucketcache.percentage.in.combinedcache"].
+You may have made use of this configuration if you are using BucketCache. If 
NOT using BucketCache, this change does not affect you. Its removal means that 
your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the 
way you would size the on-heap L1 LruBlockCache if you were NOT doing 
BucketCache -- and the BucketCache size is not whatever the setting for 
`hbase.bucketcache.size` is. You may need to adjust configs to get the 
LruBlockCache and BucketCache sizes set to what they were in 0.98.x and 
previous. If you did not set this config., its default value was 0.9. If you do 
nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache 
will become `hfile.block.cache.size` times your java heap size 
(`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see 
link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify 
offheap cache config by removing the confusing 
"hbase.bucketcache.percentage.in.combinedcache"].
 
 [[hbase-12068]]
 .If you have your own customer filters.
@@ -204,6 +210,14 @@ See the release notes on the issue 
link:https://issues.apache.org/jira/browse/HB
 .Distributed Log Replay
 <<distributed.log.replay>> is off by default in HBase 1.0.0. Enabling it can 
make a big difference improving HBase MTTR. Enable this feature if you are 
doing a clean stop/start when you are upgrading. You cannot rolling upgrade to 
this feature (caveat if you are running on a version of HBase in excess of 
HBase 0.98.4 -- see 
link:https://issues.apache.org/jira/browse/HBASE-12577[HBASE-12577 Disable 
distributed log replay by default] for more).
 
+.Mismatch Of `hbase.client.scanner.max.result.size` Between Client and Server
+If either the client or server version is lower than 0.98.11/1.0.0 and the 
server
+has a smaller value for `hbase.client.scanner.max.result.size` than the 
client, scan
+requests that reach the server's `hbase.client.scanner.max.result.size` are 
likely
+to miss data. In particular, 0.98.11 defaults 
`hbase.client.scanner.max.result.size`
+to 2 MB but other versions default to larger values. For this reason, be very 
careful
+using 0.98.11 servers with any other client version.
+
 [[upgrade1.0.rolling.upgrade]]
 ==== Rolling upgrade from 0.98.x to HBase 1.0.0
 .From 0.96.x to 1.0.0
@@ -378,7 +392,7 @@ The migration is a one-time event. However, every time your 
cluster starts, `MET
 
 [[upgrade0.94]]
 === Upgrading from 0.92.x to 0.94.x
-We used to think that 0.92 and 0.94 were interface compatible and that you can 
do a rolling upgrade between these versions but then we figured that 
link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder 
pattern in HColumnDescriptor] changed method signatures so rather than return 
`void` they instead return `HColumnDescriptor`. This will 
throw`java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 
are NOT compatible. You cannot do a rolling upgrade between them.
+We used to think that 0.92 and 0.94 were interface compatible and that you can 
do a rolling upgrade between these versions but then we figured that 
link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder 
pattern in HColumnDescriptor] changed method signatures so rather than return 
`void` they instead return `HColumnDescriptor`. This will throw 
`java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 
are NOT compatible. You cannot do a rolling upgrade between them.
 
 [[upgrade0.92]]
 === Upgrading from 0.90.x to 0.92.x

[02/11] hbase git commit: HBASE-13908 update site docs for 1.2 RC.

Reply via email to