http://git-wip-us.apache.org/repos/asf/hbase-site/blob/eb5d2c62/book.html ---------------------------------------------------------------------- diff --git a/book.html b/book.html index 33f6b3f..9483e80 100644 --- a/book.html +++ b/book.html @@ -162,173 +162,180 @@ <li><a href="#br.limitations">91. Limitations of the Backup and Restore Utility</a></li> </ul> </li> +<li><a href="#syncreplication">Synchronous Replication</a> +<ul class="sectlevel1"> +<li><a href="#_background">92. Background</a></li> +<li><a href="#_design">93. Design</a></li> +<li><a href="#_operation_and_maintenance">94. Operation and maintenance</a></li> +</ul> +</li> <li><a href="#hbase_apis">Apache HBase APIs</a> <ul class="sectlevel1"> -<li><a href="#_examples">92. Examples</a></li> +<li><a href="#_examples">95. Examples</a></li> </ul> </li> <li><a href="#external_apis">Apache HBase External APIs</a> <ul class="sectlevel1"> -<li><a href="#_rest">93. REST</a></li> -<li><a href="#_thrift">94. Thrift</a></li> -<li><a href="#c">95. C/C++ Apache HBase Client</a></li> -<li><a href="#jdo">96. Using Java Data Objects (JDO) with HBase</a></li> -<li><a href="#scala">97. Scala</a></li> -<li><a href="#jython">98. Jython</a></li> +<li><a href="#_rest">96. REST</a></li> +<li><a href="#_thrift">97. Thrift</a></li> +<li><a href="#c">98. C/C++ Apache HBase Client</a></li> +<li><a href="#jdo">99. Using Java Data Objects (JDO) with HBase</a></li> +<li><a href="#scala">100. Scala</a></li> +<li><a href="#jython">101. Jython</a></li> </ul> </li> <li><a href="#thrift">Thrift API and Filter Language</a> <ul class="sectlevel1"> -<li><a href="#thrift.filter_language">99. Filter Language</a></li> +<li><a href="#thrift.filter_language">102. Filter Language</a></li> </ul> </li> <li><a href="#spark">HBase and Spark</a> <ul class="sectlevel1"> -<li><a href="#_basic_spark">100. Basic Spark</a></li> -<li><a href="#_spark_streaming">101. Spark Streaming</a></li> -<li><a href="#_bulk_load">102. Bulk Load</a></li> -<li><a href="#_sparksql_dataframes">103. SparkSQL/DataFrames</a></li> +<li><a href="#_basic_spark">103. Basic Spark</a></li> +<li><a href="#_spark_streaming">104. Spark Streaming</a></li> +<li><a href="#_bulk_load">105. Bulk Load</a></li> +<li><a href="#_sparksql_dataframes">106. SparkSQL/DataFrames</a></li> </ul> </li> <li><a href="#cp">Apache HBase Coprocessors</a> <ul class="sectlevel1"> -<li><a href="#_coprocessor_overview">104. Coprocessor Overview</a></li> -<li><a href="#_types_of_coprocessors">105. Types of Coprocessors</a></li> -<li><a href="#cp_loading">106. Loading Coprocessors</a></li> -<li><a href="#cp_example">107. Examples</a></li> -<li><a href="#_guidelines_for_deploying_a_coprocessor">108. Guidelines For Deploying A Coprocessor</a></li> -<li><a href="#_restricting_coprocessor_usage">109. Restricting Coprocessor Usage</a></li> +<li><a href="#_coprocessor_overview">107. Coprocessor Overview</a></li> +<li><a href="#_types_of_coprocessors">108. Types of Coprocessors</a></li> +<li><a href="#cp_loading">109. Loading Coprocessors</a></li> +<li><a href="#cp_example">110. Examples</a></li> +<li><a href="#_guidelines_for_deploying_a_coprocessor">111. Guidelines For Deploying A Coprocessor</a></li> +<li><a href="#_restricting_coprocessor_usage">112. Restricting Coprocessor Usage</a></li> </ul> </li> <li><a href="#performance">Apache HBase Performance Tuning</a> <ul class="sectlevel1"> -<li><a href="#perf.os">110. Operating System</a></li> -<li><a href="#perf.network">111. Network</a></li> -<li><a href="#jvm">112. Java</a></li> -<li><a href="#perf.configurations">113. HBase Configurations</a></li> -<li><a href="#perf.zookeeper">114. ZooKeeper</a></li> -<li><a href="#perf.schema">115. Schema Design</a></li> -<li><a href="#perf.general">116. HBase General Patterns</a></li> -<li><a href="#perf.writing">117. Writing to HBase</a></li> -<li><a href="#perf.reading">118. Reading from HBase</a></li> -<li><a href="#perf.deleting">119. Deleting from HBase</a></li> -<li><a href="#perf.hdfs">120. HDFS</a></li> -<li><a href="#perf.ec2">121. Amazon EC2</a></li> -<li><a href="#perf.hbase.mr.cluster">122. Collocating HBase and MapReduce</a></li> -<li><a href="#perf.casestudy">123. Case Studies</a></li> +<li><a href="#perf.os">113. Operating System</a></li> +<li><a href="#perf.network">114. Network</a></li> +<li><a href="#jvm">115. Java</a></li> +<li><a href="#perf.configurations">116. HBase Configurations</a></li> +<li><a href="#perf.zookeeper">117. ZooKeeper</a></li> +<li><a href="#perf.schema">118. Schema Design</a></li> +<li><a href="#perf.general">119. HBase General Patterns</a></li> +<li><a href="#perf.writing">120. Writing to HBase</a></li> +<li><a href="#perf.reading">121. Reading from HBase</a></li> +<li><a href="#perf.deleting">122. Deleting from HBase</a></li> +<li><a href="#perf.hdfs">123. HDFS</a></li> +<li><a href="#perf.ec2">124. Amazon EC2</a></li> +<li><a href="#perf.hbase.mr.cluster">125. Collocating HBase and MapReduce</a></li> +<li><a href="#perf.casestudy">126. Case Studies</a></li> </ul> </li> <li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#trouble.general">124. General Guidelines</a></li> -<li><a href="#trouble.log">125. Logs</a></li> -<li><a href="#trouble.resources">126. Resources</a></li> -<li><a href="#trouble.tools">127. Tools</a></li> -<li><a href="#trouble.client">128. Client</a></li> -<li><a href="#trouble.mapreduce">129. MapReduce</a></li> -<li><a href="#trouble.namenode">130. NameNode</a></li> -<li><a href="#trouble.network">131. Network</a></li> -<li><a href="#trouble.rs">132. RegionServer</a></li> -<li><a href="#trouble.master">133. Master</a></li> -<li><a href="#trouble.zookeeper">134. ZooKeeper</a></li> -<li><a href="#trouble.ec2">135. Amazon EC2</a></li> -<li><a href="#trouble.versions">136. HBase and Hadoop version issues</a></li> -<li><a href="#_hbase_and_hdfs">137. HBase and HDFS</a></li> -<li><a href="#trouble.tests">138. Running unit or integration tests</a></li> -<li><a href="#trouble.casestudy">139. Case Studies</a></li> -<li><a href="#trouble.crypto">140. Cryptographic Features</a></li> -<li><a href="#_operating_system_specific_issues">141. Operating System Specific Issues</a></li> -<li><a href="#_jdk_issues">142. JDK Issues</a></li> +<li><a href="#trouble.general">127. General Guidelines</a></li> +<li><a href="#trouble.log">128. Logs</a></li> +<li><a href="#trouble.resources">129. Resources</a></li> +<li><a href="#trouble.tools">130. Tools</a></li> +<li><a href="#trouble.client">131. Client</a></li> +<li><a href="#trouble.mapreduce">132. MapReduce</a></li> +<li><a href="#trouble.namenode">133. NameNode</a></li> +<li><a href="#trouble.network">134. Network</a></li> +<li><a href="#trouble.rs">135. RegionServer</a></li> +<li><a href="#trouble.master">136. Master</a></li> +<li><a href="#trouble.zookeeper">137. ZooKeeper</a></li> +<li><a href="#trouble.ec2">138. Amazon EC2</a></li> +<li><a href="#trouble.versions">139. HBase and Hadoop version issues</a></li> +<li><a href="#_hbase_and_hdfs">140. HBase and HDFS</a></li> +<li><a href="#trouble.tests">141. Running unit or integration tests</a></li> +<li><a href="#trouble.casestudy">142. Case Studies</a></li> +<li><a href="#trouble.crypto">143. Cryptographic Features</a></li> +<li><a href="#_operating_system_specific_issues">144. Operating System Specific Issues</a></li> +<li><a href="#_jdk_issues">145. JDK Issues</a></li> </ul> </li> <li><a href="#casestudies">Apache HBase Case Studies</a> <ul class="sectlevel1"> -<li><a href="#casestudies.overview">143. Overview</a></li> -<li><a href="#casestudies.schema">144. Schema Design</a></li> -<li><a href="#casestudies.perftroub">145. Performance/Troubleshooting</a></li> +<li><a href="#casestudies.overview">146. Overview</a></li> +<li><a href="#casestudies.schema">147. Schema Design</a></li> +<li><a href="#casestudies.perftroub">148. Performance/Troubleshooting</a></li> </ul> </li> <li><a href="#ops_mgt">Apache HBase Operational Management</a> <ul class="sectlevel1"> -<li><a href="#tools">146. HBase Tools and Utilities</a></li> -<li><a href="#ops.regionmgt">147. Region Management</a></li> -<li><a href="#node.management">148. Node Management</a></li> -<li><a href="#hbase_metrics">149. HBase Metrics</a></li> -<li><a href="#ops.monitoring">150. HBase Monitoring</a></li> -<li><a href="#_cluster_replication">151. Cluster Replication</a></li> -<li><a href="#_running_multiple_workloads_on_a_single_cluster">152. Running Multiple Workloads On a Single Cluster</a></li> -<li><a href="#ops.backup">153. HBase Backup</a></li> -<li><a href="#ops.snapshots">154. HBase Snapshots</a></li> -<li><a href="#snapshots_azure">155. Storing Snapshots in Microsoft Azure Blob Storage</a></li> -<li><a href="#ops.capacity">156. Capacity Planning and Region Sizing</a></li> -<li><a href="#table.rename">157. Table Rename</a></li> -<li><a href="#rsgroup">158. RegionServer Grouping</a></li> -<li><a href="#normalizer">159. Region Normalizer</a></li> +<li><a href="#tools">149. HBase Tools and Utilities</a></li> +<li><a href="#ops.regionmgt">150. Region Management</a></li> +<li><a href="#node.management">151. Node Management</a></li> +<li><a href="#hbase_metrics">152. HBase Metrics</a></li> +<li><a href="#ops.monitoring">153. HBase Monitoring</a></li> +<li><a href="#_cluster_replication">154. Cluster Replication</a></li> +<li><a href="#_running_multiple_workloads_on_a_single_cluster">155. Running Multiple Workloads On a Single Cluster</a></li> +<li><a href="#ops.backup">156. HBase Backup</a></li> +<li><a href="#ops.snapshots">157. HBase Snapshots</a></li> +<li><a href="#snapshots_azure">158. Storing Snapshots in Microsoft Azure Blob Storage</a></li> +<li><a href="#ops.capacity">159. Capacity Planning and Region Sizing</a></li> +<li><a href="#table.rename">160. Table Rename</a></li> +<li><a href="#rsgroup">161. RegionServer Grouping</a></li> +<li><a href="#normalizer">162. Region Normalizer</a></li> </ul> </li> <li><a href="#developer">Building and Developing Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#getting.involved">160. Getting Involved</a></li> -<li><a href="#repos">161. Apache HBase Repositories</a></li> -<li><a href="#_ides">162. IDEs</a></li> -<li><a href="#build">163. Building Apache HBase</a></li> -<li><a href="#releasing">164. Releasing Apache HBase</a></li> -<li><a href="#hbase.rc.voting">165. Voting on Release Candidates</a></li> -<li><a href="#hbase.release.announcement">166. Announcing Releases</a></li> -<li><a href="#documentation">167. Generating the HBase Reference Guide</a></li> -<li><a href="#hbase.org">168. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> -<li><a href="#hbase.tests">169. Tests</a></li> -<li><a href="#developing">170. Developer Guidelines</a></li> +<li><a href="#getting.involved">163. Getting Involved</a></li> +<li><a href="#repos">164. Apache HBase Repositories</a></li> +<li><a href="#_ides">165. IDEs</a></li> +<li><a href="#build">166. Building Apache HBase</a></li> +<li><a href="#releasing">167. Releasing Apache HBase</a></li> +<li><a href="#hbase.rc.voting">168. Voting on Release Candidates</a></li> +<li><a href="#hbase.release.announcement">169. Announcing Releases</a></li> +<li><a href="#documentation">170. Generating the HBase Reference Guide</a></li> +<li><a href="#hbase.org">171. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> +<li><a href="#hbase.tests">172. Tests</a></li> +<li><a href="#developing">173. Developer Guidelines</a></li> </ul> </li> <li><a href="#unit.tests">Unit Testing HBase Applications</a> <ul class="sectlevel1"> -<li><a href="#_junit">171. JUnit</a></li> -<li><a href="#mockito">172. Mockito</a></li> -<li><a href="#_mrunit">173. MRUnit</a></li> -<li><a href="#_integration_testing_with_an_hbase_mini_cluster">174. Integration Testing with an HBase Mini-Cluster</a></li> +<li><a href="#_junit">174. JUnit</a></li> +<li><a href="#mockito">175. Mockito</a></li> +<li><a href="#_mrunit">176. MRUnit</a></li> +<li><a href="#_integration_testing_with_an_hbase_mini_cluster">177. Integration Testing with an HBase Mini-Cluster</a></li> </ul> </li> <li><a href="#protobuf">Protobuf in HBase</a> <ul class="sectlevel1"> -<li><a href="#_protobuf">175. Protobuf</a></li> +<li><a href="#_protobuf">178. Protobuf</a></li> </ul> </li> <li><a href="#pv2">Procedure Framework (Pv2): <a href="https://issues.apache.org/jira/browse/HBASE-12439">HBASE-12439</a></a> <ul class="sectlevel1"> -<li><a href="#_procedures">176. Procedures</a></li> -<li><a href="#_subprocedures">177. Subprocedures</a></li> -<li><a href="#_procedureexecutor">178. ProcedureExecutor</a></li> -<li><a href="#_nonces">179. Nonces</a></li> -<li><a href="#_wait_wake_suspend_yield">180. Wait/Wake/Suspend/Yield</a></li> -<li><a href="#_locking">181. Locking</a></li> -<li><a href="#_procedure_types">182. Procedure Types</a></li> -<li><a href="#_references">183. References</a></li> +<li><a href="#_procedures">179. Procedures</a></li> +<li><a href="#_subprocedures">180. Subprocedures</a></li> +<li><a href="#_procedureexecutor">181. ProcedureExecutor</a></li> +<li><a href="#_nonces">182. Nonces</a></li> +<li><a href="#_wait_wake_suspend_yield">183. Wait/Wake/Suspend/Yield</a></li> +<li><a href="#_locking">184. Locking</a></li> +<li><a href="#_procedure_types">185. Procedure Types</a></li> +<li><a href="#_references">186. References</a></li> </ul> </li> <li><a href="#amv2">AMv2 Description for Devs</a> <ul class="sectlevel1"> -<li><a href="#_background">184. Background</a></li> -<li><a href="#_new_system">185. New System</a></li> -<li><a href="#_procedures_detail">186. Procedures Detail</a></li> -<li><a href="#_ui">187. UI</a></li> -<li><a href="#_logging">188. Logging</a></li> -<li><a href="#_implementation_notes">189. Implementation Notes</a></li> -<li><a href="#_new_configs">190. New Configs</a></li> -<li><a href="#_tools">191. Tools</a></li> +<li><a href="#_background_2">187. Background</a></li> +<li><a href="#_new_system">188. New System</a></li> +<li><a href="#_procedures_detail">189. Procedures Detail</a></li> +<li><a href="#_ui">190. UI</a></li> +<li><a href="#_logging">191. Logging</a></li> +<li><a href="#_implementation_notes">192. Implementation Notes</a></li> +<li><a href="#_new_configs">193. New Configs</a></li> +<li><a href="#_tools">194. Tools</a></li> </ul> </li> <li><a href="#zookeeper">ZooKeeper</a> <ul class="sectlevel1"> -<li><a href="#_using_existing_zookeeper_ensemble">192. Using existing ZooKeeper ensemble</a></li> -<li><a href="#zk.sasl.auth">193. SASL Authentication with ZooKeeper</a></li> +<li><a href="#_using_existing_zookeeper_ensemble">195. Using existing ZooKeeper ensemble</a></li> +<li><a href="#zk.sasl.auth">196. SASL Authentication with ZooKeeper</a></li> </ul> </li> <li><a href="#community">Community</a> <ul class="sectlevel1"> -<li><a href="#_decisions">194. Decisions</a></li> -<li><a href="#community.roles">195. Community Roles</a></li> -<li><a href="#hbase.commit.msg.format">196. Commit Message format</a></li> +<li><a href="#_decisions">197. Decisions</a></li> +<li><a href="#community.roles">198. Community Roles</a></li> +<li><a href="#hbase.commit.msg.format">199. Commit Message format</a></li> </ul> </li> <li><a href="#_appendix">Appendix</a> @@ -346,11 +353,11 @@ <li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li> <li><a href="#orca">Appendix L: Apache HBase Orca</a></li> <li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li> -<li><a href="#tracing.client.modifications">197. Client Modifications</a></li> -<li><a href="#tracing.client.shell">198. Tracing from HBase Shell</a></li> +<li><a href="#tracing.client.modifications">200. Client Modifications</a></li> +<li><a href="#tracing.client.shell">201. Tracing from HBase Shell</a></li> <li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li> <li><a href="#_known_incompatibilities_among_hbase_versions">Appendix O: Known Incompatibilities Among HBase Versions</a></li> -<li><a href="#_hbase_2_0_incompatible_changes">199. HBase 2.0 Incompatible Changes</a></li> +<li><a href="#_hbase_2_0_incompatible_changes">202. HBase 2.0 Incompatible Changes</a></li> </ul> </li> </ul> @@ -7170,7 +7177,81 @@ good justification to add it back, bring it our notice (<a href="mailto:dev@hbas <div class="sect3"> <h4 id="upgrade2.0.rolling.upgrades"><a class="anchor" href="#upgrade2.0.rolling.upgrades"></a>13.1.3. Rolling Upgrade from 1.x to 2.x</h4> <div class="paragraph"> -<p>There is no rolling upgrade from HBase 1.x+ to HBase 2.x+. In order to perform a zero downtime upgrade, you will need to run an additional cluster in parallel and handle failover in application logic.</p> +<p>Rolling upgrades are currently an experimental feature. +They have had limited testing. There are likely corner +cases as yet uncovered in our +limited experience so you should be careful if you go this +route. The stop/upgrade/start described in the next section, +<a href="#upgrade2.0.process">Upgrade process from 1.x to 2.x</a>, is the safest route.</p> +</div> +<div class="paragraph"> +<p>That said, the below is a prescription for a +rolling upgrade of a 1.4 cluster.</p> +</div> +<div class="ulist"> +<div class="title">Pre-Requirements</div> +<ul> +<li> +<p>Upgrade to the latest 1.4.x release. Pre 1.4 releases may also work but are not tested, so please upgrade to 1.4.3+ before upgrading to 2.x, unless you are an expert and familiar with the region assignment and crash processing. See the section <a href="#upgrade1.4">Upgrading from pre-1.4 to 1.4+</a> on how to upgrade to 1.4.x.</p> +</li> +<li> +<p>Make sure that the zk-less assignment is enabled, i.e, set <code>hbase.assignment.usezk</code> to <code>false</code>. This is the most important thing. It allows the 1.x master to assign/unassign regions to/from 2.x region servers. See the release note section of <a href="https://issues.apache.org/jira/browse/HBASE-11059">HBASE-11059</a> on how to migrate from zk based assignment to zk less assignment.</p> +</li> +<li> +<p>We have tested rolling upgrading from 1.4.3 to 2.1.0, but it should also work if you want to upgrade to 2.0.x.</p> +</li> +</ul> +</div> +<div class="olist arabic"> +<div class="title">Instructions</div> +<ol class="arabic"> +<li> +<p>Unload a region server and upgrade it to 2.1.0. With <a href="https://issues.apache.org/jira/browse/HBASE-17931">HBASE-17931</a> in place, the meta region and regions for other system tables will be moved to this region server immediately. If not, please move them manually to the new region server. This is very important because</p> +<div class="ulist"> +<ul> +<li> +<p>The schema of meta region is hard coded, if meta is on an old region server, then the new region servers can not access it as it does not have some families, for example, table state.</p> +</li> +<li> +<p>Client with lower version can communicate with server with higher version, but not vice versa. If the meta region is on an old region server, the new region server will use a client with higher version to communicate with a server with lower version, this may introduce strange problems.</p> +</li> +</ul> +</div> +</li> +<li> +<p>Rolling upgrade all other region servers.</p> +</li> +<li> +<p>Upgrading masters.</p> +</li> +</ol> +</div> +<div class="paragraph"> +<p>It is OK that during the rolling upgrading there are region server crashes. The 1.x master can assign regions to both 1.x and 2.x region servers, and <a href="https://issues.apache.org/jira/browse/HBASE-19166">HBASE-19166</a> fixed a problem so that 1.x region server can also read the WALs written by 2.x region server and split them.</p> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +please read the <a href="#_changes_of_note">Changes of Note!</a> section carefully before rolling upgrading. Make sure that you do not use the removed features in 2.0, for example, the prefix-tree encoding, the old hfile format, etc. They could both fail the upgrading and leave the cluster in an intermediate state and hard to recover. +</td> +</tr> +</table> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +If you have success running this prescription, please notify the dev list with a note on your experience and/or update the above with any deviations you may have taken so others going this route can benefit from your efforts. +</td> +</tr> +</table> </div> </div> <div class="sect3"> @@ -19616,6 +19697,160 @@ which present information for consumption (<a href="https://issues.apache.org/ji </div> </div> </div> +<h1 id="syncreplication" class="sect0"><a class="anchor" href="#syncreplication"></a>Synchronous Replication</h1> +<div class="sect1"> +<h2 id="_background"><a class="anchor" href="#_background"></a>92. Background</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>The current <a href="#_cluster_replication">replication</a> in HBase in asynchronous. So if the master cluster crashes, the slave cluster may not have the +newest data. If users want strong consistency then they can not switch to the slave cluster.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="_design"><a class="anchor" href="#_design"></a>93. Design</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>Please see the design doc on <a href="https://issues.apache.org/jira/browse/HBASE-19064">HBASE-19064</a></p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="_operation_and_maintenance"><a class="anchor" href="#_operation_and_maintenance"></a>94. Operation and maintenance</h2> +<div class="sectionbody"> +<div class="dlist"> +<dl> +<dt class="hdlist1">Case.1 Setup two synchronous replication clusters</dt> +<dd> +<div class="ulist"> +<ul> +<li> +<p>Add a synchronous peer in both source cluster and peer cluster.</p> +</li> +</ul> +</div> +</dd> +</dl> +</div> +<div class="paragraph"> +<p>For source cluster:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> add_peer <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="constant">CLUSTER_KEY</span> => <span class="string"><span class="delimiter">'</span><span class="content">lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase-slave</span><span class="delimiter">'</span></span>, <span class="constant">REMOTE_WAL_DIR</span>=><span class="string"><span class="delimiter">'</span><span class="content">hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase-slave/remoteWALs</span><span class="delimiter">'</span></span>, <span class="constant">TABLE_CFS</span> => {<span class="string"><span class="delimiter">"</span><span class="content">ycsb-test</span><span class="delimiter">"</span></span>=>[]}</code></pre> +</div> +</div> +<div class="paragraph"> +<p>For peer cluster:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> add_peer <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="constant">CLUSTER_KEY</span> => <span class="string"><span class="delimiter">'</span><span class="content">lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase</span><span class="delimiter">'</span></span>, <span class="constant">REMOTE_WAL_DIR</span>=><span class="string"><span class="delimiter">'</span><span class="content">hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase/remoteWALs</span><span class="delimiter">'</span></span>, <span class="constant">TABLE_CFS</span> => {<span class="string"><span class="delimiter">"</span><span class="content">ycsb-test</span><span class="delimiter">"</span></span>=>[]}</code></pre> +</div> +</div> +<div class="admonitionblock note"> +<table> +<tr> +<td class="icon"> +<i class="fa icon-note" title="Note"></i> +</td> +<td class="content"> +For synchronous replication, the current implementation require that we have the same peer id for both source +and peer cluster. Another thing that need attention is: the peer does not support cluster-level, namespace-level, or +cf-level replication, only support table-level replication now. +</td> +</tr> +</table> +</div> +<div class="ulist"> +<ul> +<li> +<p>Transit the peer cluster to be STANDBY state</p> +</li> +</ul> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">STANDBY</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="ulist"> +<ul> +<li> +<p>Transit the source cluster to be ACTIVE state</p> +</li> +</ul> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">ACTIVE</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="paragraph"> +<p>Now, the synchronous replication has been set up successfully. the HBase client can only request to source cluster, if +request to peer cluster, the peer cluster which is STANDBY state now will reject the read/write requests.</p> +</div> +<div class="dlist"> +<dl> +<dt class="hdlist1">Case.2 How to operate when standby cluster crashed</dt> +<dd> +<p>If the standby cluster has been crashed, it will fail to write remote WAL for the active cluster. So we need to transit +the source cluster to DOWNGRANDE_ACTIVE state, which means source cluster won’t write any remote WAL any more, but +the normal replication (asynchronous Replication) can still work fine, it queue the newly written WALs, but the +replication block until the peer cluster come back.</p> +</dd> +</dl> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">DOWNGRADE_ACTIVE</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="paragraph"> +<p>Once the peer cluster come back, we can just transit the source cluster to ACTIVE, to ensure that the replication will be +synchronous.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">ACTIVE</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="dlist"> +<dl> +<dt class="hdlist1">Case.3 How to operate when active cluster crashed</dt> +<dd> +<p>If the active cluster has been crashed (it may be not reachable now), so let’s just transit the standby cluster to +DOWNGRANDE_ACTIVE state, and after that, we should redirect all the requests from client to the DOWNGRADE_ACTIVE cluster.</p> +</dd> +</dl> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">DOWNGRADE_ACTIVE</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="paragraph"> +<p>If the crashed cluster come back again, we just need to transit it to STANDBY directly. Otherwise if you transit the +cluster to DOWNGRADE_ACTIVE, the original ACTIVE cluster may have redundant data compared to the current ACTIVE +cluster. Because we designed to write source cluster WALs and remote cluster WALs concurrently, so it’s possible that +the source cluster WALs has more data than the remote cluster, which result in data inconsistency. The procedure of +transiting ACTIVE to STANDBY has no problem, because we’ll skip to replay the original WALs.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">STANDBY</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +<div class="paragraph"> +<p>After that, we can promote the DOWNGRADE_ACTIVE cluster to ACTIVE now, to ensure that the replication will be synchronous.</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase> transit_peer_sync_replication_state <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content">ACTIVE</span><span class="delimiter">'</span></span></code></pre> +</div> +</div> +</div> +</div> <h1 id="hbase_apis" class="sect0"><a class="anchor" href="#hbase_apis"></a>Apache HBase APIs</h1> <div class="openblock partintro"> <div class="content"> @@ -19631,7 +19866,7 @@ See <a href="#external_apis">Apache HBase External APIs</a> for more information </div> </div> <div class="sect1"> -<h2 id="_examples"><a class="anchor" href="#_examples"></a>92. Examples</h2> +<h2 id="_examples"><a class="anchor" href="#_examples"></a>95. Examples</h2> <div class="sectionbody"> <div class="exampleblock"> <div class="title">Example 25. Create, modify and delete a Table Using Java</div> @@ -19742,7 +19977,7 @@ through custom protocols. For information on using the native HBase APIs, refer </div> </div> <div class="sect1"> -<h2 id="_rest"><a class="anchor" href="#_rest"></a>93. REST</h2> +<h2 id="_rest"><a class="anchor" href="#_rest"></a>96. REST</h2> <div class="sectionbody"> <div class="paragraph"> <p>Representational State Transfer (REST) was introduced in 2000 in the doctoral @@ -19758,7 +19993,7 @@ There is also a nice series of blogs on by Jesse Anderson.</p> </div> <div class="sect2"> -<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>93.1. Starting and Stopping the REST Server</h3> +<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>96.1. Starting and Stopping the REST Server</h3> <div class="paragraph"> <p>The included REST server can run as a daemon which starts an embedded Jetty servlet container and deploys the servlet into it. Use one of the following commands @@ -19785,7 +20020,7 @@ following command if you were running it in the background.</p> </div> </div> <div class="sect2"> -<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>93.2. Configuring the REST Server and Client</h3> +<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>96.2. Configuring the REST Server and Client</h3> <div class="paragraph"> <p>For information about configuring the REST server and client for SSL, as well as <code>doAs</code> impersonation for the REST server, see <a href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on Behalf of the Client</a> and other portions @@ -19793,7 +20028,7 @@ of the <a href="#security">Securing Apache HBase</a> chapter.</p> </div> </div> <div class="sect2"> -<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>93.3. Using REST Endpoints</h3> +<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>96.3. Using REST Endpoints</h3> <div class="paragraph"> <p>The following examples use the placeholder server http://example.com:8000, and the following commands can all be run using <code>curl</code> or <code>wget</code> commands. You can request @@ -20160,7 +20395,7 @@ curl -vi -X PUT \ </table> </div> <div class="sect2"> -<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>93.4. REST XML Schema</h3> +<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>96.4. REST XML Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="xml"><span class="tag"><schema</span> <span class="attribute-name">xmlns</span>=<span class="string"><span class="delimiter">"</span><span class="content">http://www.w3.org/2001/XMLSchema</span><span class="delimiter">"</span></span> <span class="attribute-name">xmlns:tns</span>=<span class="string"><span class="delimiter">"</span><span class="content">RESTSchema</span><span class="delimiter">"</span></span><span class="tag">></span> @@ -20318,7 +20553,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect2"> -<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>93.5. REST Protobufs Schema</h3> +<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>96.5. REST Protobufs Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="json"><span class="error">m</span><span class="error">e</span><span class="error">s</span><span class="error">s</span><span class="error">a</span><span class="error">g</span><span class="error">e</span> <span class="error">V</span><span class="error">e</span><span class="error">r</span><span class="error">s</span><span class="error">i</span><span class="error">o</span><span class="error">n</span> { @@ -20426,7 +20661,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>94. Thrift</h2> +<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>97. Thrift</h2> <div class="sectionbody"> <div class="paragraph"> <p>Documentation about Thrift has moved to <a href="#thrift">Thrift API and Filter Language</a>.</p> @@ -20434,7 +20669,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="c"><a class="anchor" href="#c"></a>95. C/C++ Apache HBase Client</h2> +<h2 id="c"><a class="anchor" href="#c"></a>98. C/C++ Apache HBase Client</h2> <div class="sectionbody"> <div class="paragraph"> <p>FB’s Chip Turner wrote a pure C/C++ client. @@ -20446,7 +20681,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="jdo"><a class="anchor" href="#jdo"></a>96. Using Java Data Objects (JDO) with HBase</h2> +<h2 id="jdo"><a class="anchor" href="#jdo"></a>99. Using Java Data Objects (JDO) with HBase</h2> <div class="sectionbody"> <div class="paragraph"> <p><a href="https://db.apache.org/jdo/">Java Data Objects (JDO)</a> is a standard way to @@ -20603,10 +20838,10 @@ a row, get a column value, perform a query, and do some additional HBase operati </div> </div> <div class="sect1"> -<h2 id="scala"><a class="anchor" href="#scala"></a>97. Scala</h2> +<h2 id="scala"><a class="anchor" href="#scala"></a>100. Scala</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>97.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>100.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Scala with HBase, your CLASSPATH must include HBase’s classpath as well as the Scala JARs required by your code. First, use the following command on a server @@ -20631,7 +20866,7 @@ your project.</p> </div> </div> <div class="sect2"> -<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>97.2. Scala SBT File</h3> +<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>100.2. Scala SBT File</h3> <div class="paragraph"> <p>Your <code>build.sbt</code> file needs the following <code>resolvers</code> and <code>libraryDependencies</code> to work with HBase.</p> @@ -20650,7 +20885,7 @@ libraryDependencies ++= Seq( </div> </div> <div class="sect2"> -<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>97.3. Example Scala Code</h3> +<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>100.3. Example Scala Code</h3> <div class="paragraph"> <p>This example lists HBase tables, creates a new table, and adds a row to it.</p> </div> @@ -20688,10 +20923,10 @@ println(Bytes.toString(value))</code></pre> </div> </div> <div class="sect1"> -<h2 id="jython"><a class="anchor" href="#jython"></a>98. Jython</h2> +<h2 id="jython"><a class="anchor" href="#jython"></a>101. Jython</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>98.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>101.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Jython with HBase, your CLASSPATH must include HBase’s classpath as well as the Jython JARs required by your code.</p> @@ -20711,7 +20946,7 @@ $ bin/hbase org.python.util.jython</p> </div> </div> <div class="sect2"> -<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>98.2. Jython Code Examples</h3> +<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>101.2. Jython Code Examples</h3> <div class="exampleblock"> <div class="title">Example 27. Table Creation, Population, Get, and Delete with Jython</div> <div class="content"> @@ -20816,7 +21051,7 @@ The Thrift API relies on client and server processes.</p> </div> </div> <div class="sect1"> -<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>99. Filter Language</h2> +<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>102. Filter Language</h2> <div class="sectionbody"> <div class="paragraph"> <p>Thrift Filter Language was introduced in HBase 0.92. @@ -20827,7 +21062,7 @@ You can find out more about shell integration by using the <code>scan help</code <p>You specify a filter as a string, which is parsed on the server to construct the filter.</p> </div> <div class="sect2"> -<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>99.1. General Filter String Syntax</h3> +<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>102.1. General Filter String Syntax</h3> <div class="paragraph"> <p>A simple filter expression is expressed as a string:</p> </div> @@ -20862,7 +21097,7 @@ If single quotes are present in the argument, they must be escaped by an additio </div> </div> <div class="sect2"> -<h3 id="_compound_filters_and_operators"><a class="anchor" href="#_compound_filters_and_operators"></a>99.2. Compound Filters and Operators</h3> +<h3 id="_compound_filters_and_operators"><a class="anchor" href="#_compound_filters_and_operators"></a>102.2. Compound Filters and Operators</h3> <div class="dlist"> <div class="title">Binary Operators</div> <dl> @@ -20904,7 +21139,7 @@ If single quotes are present in the argument, they must be escaped by an additio </div> </div> <div class="sect2"> -<h3 id="_order_of_evaluation"><a class="anchor" href="#_order_of_evaluation"></a>99.3. Order of Evaluation</h3> +<h3 id="_order_of_evaluation"><a class="anchor" href="#_order_of_evaluation"></a>102.3. Order of Evaluation</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -20942,7 +21177,7 @@ is evaluated as </div> </div> <div class="sect2"> -<h3 id="_compare_operator"><a class="anchor" href="#_compare_operator"></a>99.4. Compare Operator</h3> +<h3 id="_compare_operator"><a class="anchor" href="#_compare_operator"></a>102.4. Compare Operator</h3> <div class="paragraph"> <p>The following compare operators are provided:</p> </div> @@ -20976,7 +21211,7 @@ is evaluated as </div> </div> <div class="sect2"> -<h3 id="_comparator"><a class="anchor" href="#_comparator"></a>99.5. Comparator</h3> +<h3 id="_comparator"><a class="anchor" href="#_comparator"></a>102.5. Comparator</h3> <div class="paragraph"> <p>A comparator can be any of the following:</p> </div> @@ -21044,7 +21279,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="examplephpclientprogram"><a class="anchor" href="#examplephpclientprogram"></a>99.6. Example PHP Client Program that uses the Filter Language</h3> +<h3 id="examplephpclientprogram"><a class="anchor" href="#examplephpclientprogram"></a>102.6. Example PHP Client Program that uses the Filter Language</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="php"><span class="inline-delimiter"><?</span> @@ -21067,7 +21302,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="_example_filter_strings"><a class="anchor" href="#_example_filter_strings"></a>99.7. Example Filter Strings</h3> +<h3 id="_example_filter_strings"><a class="anchor" href="#_example_filter_strings"></a>102.7. Example Filter Strings</h3> <div class="ulist"> <ul> <li> @@ -21116,7 +21351,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="individualfiltersyntax"><a class="anchor" href="#individualfiltersyntax"></a>99.8. Individual Filter Syntax</h3> +<h3 id="individualfiltersyntax"><a class="anchor" href="#individualfiltersyntax"></a>102.8. Individual Filter Syntax</h3> <div class="dlist"> <dl> <dt class="hdlist1">KeyOnlyFilter</dt> @@ -21259,7 +21494,7 @@ application.</p> </div> </div> <div class="sect1"> -<h2 id="_basic_spark"><a class="anchor" href="#_basic_spark"></a>100. Basic Spark</h2> +<h2 id="_basic_spark"><a class="anchor" href="#_basic_spark"></a>103. Basic Spark</h2> <div class="sectionbody"> <div class="paragraph"> <p>This section discusses Spark HBase integration at the lowest and simplest levels. @@ -21390,7 +21625,7 @@ access to HBase</p> </div> </div> <div class="sect1"> -<h2 id="_spark_streaming"><a class="anchor" href="#_spark_streaming"></a>101. Spark Streaming</h2> +<h2 id="_spark_streaming"><a class="anchor" href="#_spark_streaming"></a>104. Spark Streaming</h2> <div class="sectionbody"> <div class="paragraph"> <p><a href="https://spark.apache.org/streaming/">Spark Streaming</a> is a micro batching stream @@ -21487,7 +21722,7 @@ to the HBase Connections in the executors </div> </div> <div class="sect1"> -<h2 id="_bulk_load"><a class="anchor" href="#_bulk_load"></a>102. Bulk Load</h2> +<h2 id="_bulk_load"><a class="anchor" href="#_bulk_load"></a>105. Bulk Load</h2> <div class="sectionbody"> <div class="paragraph"> <p>There are two options for bulk loading data into HBase with Spark. There is the @@ -21695,7 +21930,7 @@ values for this row for all column families.</p> </div> </div> <div class="sect1"> -<h2 id="_sparksql_dataframes"><a class="anchor" href="#_sparksql_dataframes"></a>103. SparkSQL/DataFrames</h2> +<h2 id="_sparksql_dataframes"><a class="anchor" href="#_sparksql_dataframes"></a>106. SparkSQL/DataFrames</h2> <div class="sectionbody"> <div class="paragraph"> <p>HBase-Spark Connector (in HBase-Spark Module) leverages @@ -21715,7 +21950,7 @@ then load HBase DataFrame. After that, users can do integrated query and access in HBase table with SQL query. Following illustrates the basic procedure.</p> </div> <div class="sect2"> -<h3 id="_define_catalog"><a class="anchor" href="#_define_catalog"></a>103.1. Define catalog</h3> +<h3 id="_define_catalog"><a class="anchor" href="#_define_catalog"></a>106.1. Define catalog</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">def catalog = s"""{ @@ -21744,7 +21979,7 @@ also has to be defined in details as a column (col0), which has a specific cf (r </div> </div> <div class="sect2"> -<h3 id="_save_the_dataframe"><a class="anchor" href="#_save_the_dataframe"></a>103.2. Save the DataFrame</h3> +<h3 id="_save_the_dataframe"><a class="anchor" href="#_save_the_dataframe"></a>106.2. Save the DataFrame</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">case class HBaseRecord( @@ -21791,7 +22026,7 @@ will create an HBase table with 5 regions and save the DataFrame inside.</p> </div> </div> <div class="sect2"> -<h3 id="_load_the_dataframe"><a class="anchor" href="#_load_the_dataframe"></a>103.3. Load the DataFrame</h3> +<h3 id="_load_the_dataframe"><a class="anchor" href="#_load_the_dataframe"></a>106.3. Load the DataFrame</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">def withCatalog(cat: String): DataFrame = { @@ -21815,7 +22050,7 @@ by <code>withCatalog</code> function could be used to access HBase table, such a </div> </div> <div class="sect2"> -<h3 id="_language_integrated_query"><a class="anchor" href="#_language_integrated_query"></a>103.4. Language Integrated Query</h3> +<h3 id="_language_integrated_query"><a class="anchor" href="#_language_integrated_query"></a>106.4. Language Integrated Query</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") || @@ -21832,7 +22067,7 @@ s.show</code></pre> </div> </div> <div class="sect2"> -<h3 id="_sql_query"><a class="anchor" href="#_sql_query"></a>103.5. SQL Query</h3> +<h3 id="_sql_query"><a class="anchor" href="#_sql_query"></a>106.5. SQL Query</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">df.registerTempTable("table1") @@ -21846,7 +22081,7 @@ The lifetime of this temporary table is tied to the SQLContext that was used to </div> </div> <div class="sect2"> -<h3 id="_others"><a class="anchor" href="#_others"></a>103.6. Others</h3> +<h3 id="_others"><a class="anchor" href="#_others"></a>106.6. Others</h3> <div class="exampleblock"> <div class="title">Example 36. Query with different timestamps</div> <div class="content"> @@ -22094,7 +22329,7 @@ coprocessor can severely degrade cluster performance and stability.</p> </div> </div> <div class="sect1"> -<h2 id="_coprocessor_overview"><a class="anchor" href="#_coprocessor_overview"></a>104. Coprocessor Overview</h2> +<h2 id="_coprocessor_overview"><a class="anchor" href="#_coprocessor_overview"></a>107. Coprocessor Overview</h2> <div class="sectionbody"> <div class="paragraph"> <p>In HBase, you fetch data using a <code>Get</code> or <code>Scan</code>, whereas in an RDBMS you use a SQL @@ -22120,7 +22355,7 @@ data, and returns the result to the client.</p> are some analogies which may help to explain some of the benefits of coprocessors.</p> </div> <div class="sect2"> -<h3 id="cp_analogies"><a class="anchor" href="#cp_analogies"></a>104.1. Coprocessor Analogies</h3> +<h3 id="cp_analogies"><a class="anchor" href="#cp_analogies"></a>107.1. Coprocessor Analogies</h3> <div class="dlist"> <dl> <dt class="hdlist1">Triggers and Stored Procedure</dt> @@ -22146,7 +22381,7 @@ before passing the request on to its final destination (or even changing the des </div> </div> <div class="sect2"> -<h3 id="_coprocessor_implementation_overview"><a class="anchor" href="#_coprocessor_implementation_overview"></a>104.2. Coprocessor Implementation Overview</h3> +<h3 id="_coprocessor_implementation_overview"><a class="anchor" href="#_coprocessor_implementation_overview"></a>107.2. Coprocessor Implementation Overview</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -22174,10 +22409,10 @@ package.</p> </div> </div> <div class="sect1"> -<h2 id="_types_of_coprocessors"><a class="anchor" href="#_types_of_coprocessors"></a>105. Types of Coprocessors</h2> +<h2 id="_types_of_coprocessors"><a class="anchor" href="#_types_of_coprocessors"></a>108. Types of Coprocessors</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_observer_coprocessors"><a class="anchor" href="#_observer_coprocessors"></a>105.1. Observer Coprocessors</h3> +<h3 id="_observer_coprocessors"><a class="anchor" href="#_observer_coprocessors"></a>108.1. Observer Coprocessors</h3> <div class="paragraph"> <p>Observer coprocessors are triggered either before or after a specific event occurs. Observers that happen before an event use methods that start with a <code>pre</code> prefix, @@ -22185,7 +22420,7 @@ such as <a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/cop with a <code>post</code> prefix, such as <a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-"><code>postPut</code></a>.</p> </div> <div class="sect3"> -<h4 id="_use_cases_for_observer_coprocessors"><a class="anchor" href="#_use_cases_for_observer_coprocessors"></a>105.1.1. Use Cases for Observer Coprocessors</h4> +<h4 id="_use_cases_for_observer_coprocessors"><a class="anchor" href="#_use_cases_for_observer_coprocessors"></a>108.1.1. Use Cases for Observer Coprocessors</h4> <div class="dlist"> <dl> <dt class="hdlist1">Security</dt> @@ -22210,7 +22445,7 @@ a coprocessor to use the <code>prePut</code> method on <code>user</code> to inse </div> </div> <div class="sect3"> -<h4 id="_types_of_observer_coprocessor"><a class="anchor" href="#_types_of_observer_coprocessor"></a>105.1.2. Types of Observer Coprocessor</h4> +<h4 id="_types_of_observer_coprocessor"><a class="anchor" href="#_types_of_observer_coprocessor"></a>108.1.2. Types of Observer Coprocessor</h4> <div class="dlist"> <dl> <dt class="hdlist1">RegionObserver</dt> @@ -22246,7 +22481,7 @@ Log (WAL). See </div> </div> <div class="sect2"> -<h3 id="cpeps"><a class="anchor" href="#cpeps"></a>105.2. Endpoint Coprocessor</h3> +<h3 id="cpeps"><a class="anchor" href="#cpeps"></a>108.2. Endpoint Coprocessor</h3> <div class="paragraph"> <p>Endpoint processors allow you to perform computation at the location of the data. See <a href="#cp_analogies">Coprocessor Analogy</a>. An example is the need to calculate a running @@ -22290,14 +22525,14 @@ change.</p> </div> </div> <div class="sect1"> -<h2 id="cp_loading"><a class="anchor" href="#cp_loading"></a>106. Loading Coprocessors</h2> +<h2 id="cp_loading"><a class="anchor" href="#cp_loading"></a>109. Loading Coprocessors</h2> <div class="sectionbody"> <div class="paragraph"> <p>To make your coprocessor available to HBase, it must be <em>loaded</em>, either statically (through the HBase configuration) or dynamically (using HBase Shell or the Java API).</p> </div> <div class="sect2"> -<h3 id="_static_loading"><a class="anchor" href="#_static_loading"></a>106.1. Static Loading</h3> +<h3 id="_static_loading"><a class="anchor" href="#_static_loading"></a>109.1. Static Loading</h3> <div class="paragraph"> <p>Follow these steps to statically load your coprocessor. Keep in mind that you must restart HBase to unload a coprocessor that has been loaded statically.</p> @@ -22366,7 +22601,7 @@ HBase installation.</p> </div> </div> <div class="sect2"> -<h3 id="_static_unloading"><a class="anchor" href="#_static_unloading"></a>106.2. Static Unloading</h3> +<h3 id="_static_unloading"><a class="anchor" href="#_static_unloading"></a>109.2. Static Unloading</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -22383,7 +22618,7 @@ directory.</p> </div> </div> <div class="sect2"> -<h3 id="_dynamic_loading"><a class="anchor" href="#_dynamic_loading"></a>106.3. Dynamic Loading</h3> +<h3 id="_dynamic_loading"><a class="anchor" href="#_dynamic_loading"></a>109.3. Dynamic Loading</h3> <div class="paragraph"> <p>You can also load a coprocessor dynamically, without restarting HBase. This may seem preferable to static loading, but dynamically loaded coprocessors are loaded on a @@ -22425,7 +22660,7 @@ dependencies.</p> </table> </div> <div class="sect3"> -<h4 id="load_coprocessor_in_shell"><a class="anchor" href="#load_coprocessor_in_shell"></a>106.3.1. Using HBase Shell</h4> +<h4 id="load_coprocessor_in_shell"><a class="anchor" href="#load_coprocessor_in_shell"></a>109.3.1. Using HBase Shell</h4> <div class="olist arabic"> <ol class="arabic"> <li> @@ -22501,7 +22736,7 @@ case the framework will assign a default priority value.</p> </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api_all_hbase_versions"><a class="anchor" href="#_using_the_java_api_all_hbase_versions"></a>106.3.2. Using the Java API (all HBase versions)</h4> +<h4 id="_using_the_java_api_all_hbase_versions"><a class="anchor" href="#_using_the_java_api_all_hbase_versions"></a>109.3.2. Using the Java API (all HBase versions)</h4> <div class="paragraph"> <p>The following Java code shows how to use the <code>setValue()</code> method of <code>HTableDescriptor</code> to load a coprocessor on the <code>users</code> table.</p> @@ -22530,7 +22765,7 @@ admin.enableTable(tableName);</code></pre> </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api_hbase_0_96_only"><a class="anchor" href="#_using_the_java_api_hbase_0_96_only"></a>106.3.3. Using the Java API (HBase 0.96+ only)</h4> +<h4 id="_using_the_java_api_hbase_0_96_only"><a class="anchor" href="#_using_the_java_api_hbase_0_96_only"></a>109.3.3. Using the Java API (HBase 0.96+ only)</h4> <div class="paragraph"> <p>In HBase 0.96 and newer, the <code>addCoprocessor()</code> method of <code>HTableDescriptor</code> provides an easier way to load a coprocessor dynamically.</p> @@ -22573,9 +22808,9 @@ verifies whether the given class is actually contained in the jar file. </div> </div> <div class="sect2"> -<h3 id="_dynamic_unloading"><a class="anchor" href="#_dynamic_unloading"></a>106.4. Dynamic Unloading</h3> +<h3 id="_dynamic_unloading"><a class="anchor" href="#_dynamic_unloading"></a>109.4. Dynamic Unloading</h3> <div class="sect3"> -<h4 id="_using_hbase_shell"><a class="anchor" href="#_using_hbase_shell"></a>106.4.1. Using HBase Shell</h4> +<h4 id="_using_hbase_shell"><a class="anchor" href="#_using_hbase_shell"></a>109.4.1. Using HBase Shell</h4> <div class="olist arabic"> <ol class="arabic"> <li> @@ -22606,7 +22841,7 @@ verifies whether the given class is actually contained in the jar file. </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api"><a class="anchor" href="#_using_the_java_api"></a>106.4.2. Using the Java API</h4> +<h4 id="_using_the_java_api"><a class="anchor" href="#_using_the_java_api"></a>109.4.2. Using the Java API</h4> <div class="paragraph"> <p>Reload the table definition without setting the value of the coprocessor either by using <code>setValue()</code> or <code>addCoprocessor()</code> methods. This will remove any coprocessor @@ -22640,7 +22875,7 @@ admin.enableTable(tableName);</code></pre> </div> </div> <div class="sect1"> -<h2 id="cp_example"><a class="anchor" href="#cp_example"></a>107. Examples</h2> +<h2 id="cp_example"><a class="anchor" href="#cp_example"></a>110. Examples</h2> <div class="sectionbody"> <div class="paragraph"> <p>HBase ships examples for Observer Coprocessor.</p> @@ -22711,7 +22946,7 @@ of the <code>users</code> table.</p> </tbody> </table> <div class="sect2"> -<h3 id="_observer_example"><a class="anchor" href="#_observer_example"></a>107.1. Observer Example</h3> +<h3 id="_observer_example"><a class="anchor" href="#_observer_example"></a>110.1. Observer Example</h3> <div class="paragraph"> <p>The following Observer coprocessor prevents the details of the user <code>admin</code> from being returned in a <code>Get</code> or <code>Scan</code> of the <code>users</code> table.</p> @@ -22809,7 +23044,7 @@ remove any <code>admin</code> results from the scan:</p> </div> </div> <div class="sect2"> -<h3 id="_endpoint_example"><a class="anchor" href="#_endpoint_example"></a>107.2. Endpoint Example</h3> +<h3 id="_endpoint_example"><a class="anchor" href="#_endpoint_example"></a>110.2. Endpoint Example</h3> <div class="paragraph"> <p>Still using the <code>users</code> table, this example implements a coprocessor to calculate the sum of all employee salaries, using an endpoint coprocessor.</p> @@ -22983,7 +23218,7 @@ Table table = connection.getTable(tableName); </div> </div> <div class="sect1"> -<h2 id="_guidelines_for_deploying_a_coprocessor"><a class="anchor" href="#_guidelines_for_deploying_a_coprocessor"></a>108. Guidelines For Deploying A Coprocessor</h2> +<h2 id="_guidelines_for_deploying_a_coprocessor"><a class="anchor" href="#_guidelines_for_deploying_a_coprocessor"></a>111. Guidelines For Deploying A Coprocessor</h2> <div class="sectionbody"> <div class="dlist"> <dl> @@ -23071,7 +23306,7 @@ ResultScanner scanner = table.getScanner(scan); </div> </div> <div class="sect1"> -<h2 id="_restricting_coprocessor_usage"><a class="anchor" href="#_restricting_coprocessor_usage"></a>109. Restricting Coprocessor Usage</h2> +<h2 id="_restricting_coprocessor_usage"><a class="anchor" href="#_restricting_coprocessor_usage"></a>112. Restricting Coprocessor Usage</h2> <div class="sectionbody"> <div class="paragraph"> <p>Restricting arbitrary user coprocessors can be a big concern in multitenant environments. HBase provides a continuum of options for ensuring only expected coprocessors are running:</p> @@ -23141,30 +23376,30 @@ ResultScanner scanner = table.getScanner(scan); </div> <h1 id="performance" class="sect0"><a class="anchor" href="#performance"></a>Apache HBase Performance Tuning</h1> <div class="sect1"> -<h2 id="perf.os"><a class="anchor" href="#perf.os"></a>110. Operating System</h2> +<h2 id="perf.os"><a class="anchor" href="#perf.os"></a>113. Operating System</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="perf.os.ram"><a class="anchor" href="#perf.os.ram"></a>110.1. Memory</h3> +<h3 id="perf.os.ram"><a class="anchor" href="#perf.os.ram"></a>113.1. Memory</h3> <div class="paragraph"> <p>RAM, RAM, RAM. Don’t starve HBase.</p> </div> </div> <div class="sect2"> -<h3 id="perf.os.64"><a class="anchor" href="#perf.os.64"></a>110.2. 64-bit</h3> +<h3 id="perf.os.64"><a class="anchor" href="#perf.os.64"></a>113.2. 64-bit</h3> <div class="paragraph"> <p>Use a 64-bit platform (and 64-bit JVM).</p> </div> </div> <div class="sect2"> -<h3 id="perf.os.swap"><a class="anchor" href="#perf.os.swap"></a>110.3. Swapping</h3> +<h3 id="perf.os.swap"><a class="anchor" href="#perf.os.swap"></a>113.3. Swapping</h3> <div class="paragraph"> <p>Watch out for swapping. Set <code>swappiness</code> to 0.</p> </div> </div> <div class="sect2"> -<h3 id="perf.os.cpu"><a class="anchor" href="#perf.os.cpu"></a>110.4. CPU</h3> +<h3 id="perf.os.cpu"><a class="anchor" href="#perf.os.cpu"></a>113.4. CPU</h3> <div class="paragraph"> <p>Make sure you have set up your Hadoop to use native, hardware checksumming. See link:[hadoop.native.lib].</p> @@ -23173,7 +23408,7 @@ See link:[hadoop.native.lib].</p> </div> </div> <div class="sect1"> -<h2 id="perf.network"><a class="anchor" href="#perf.network"></a>111. Network</h2> +<h2 id="perf.network"><a class="anchor" href="#perf.network"></a>114. Network</h2> <div class="sectionbody"> <div class="paragraph"> <p>Perhaps the most important factor in avoiding network issues degrading Hadoop and HBase performance is the switching hardware that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).</p> @@ -23195,14 +23430,14 @@ See link:[hadoop.native.lib].</p> </ul> </div> <div class="sect2"> -<h3 id="perf.network.1switch"><a class="anchor" href="#perf.network.1switch"></a>111.1. Single Switch</h3> +<h3 id="perf.network.1switch"><a class="anchor" href="#perf.network.1switch"></a>114.1. Single Switch</h3> <div class="paragraph"> <p>The single most important factor in this configuration is that the switching capacity of the hardware is capable of handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware can have a slower switching capacity than could be utilized by a full switch.</p> </div> </div> <div class="sect2"> -<h3 id="perf.network.2switch"><a class="anchor" href="#perf.network.2switch"></a>111.2. Multiple Switches</h3> +<h3 id="perf.network.2switch"><a class="anchor" href="#perf.network.2switch"></a>114.2. Multiple Switches</h3> <div class="paragraph"> <p>Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a simple 1Gbps uplink from one switch to another. @@ -23228,7 +23463,7 @@ single 48 port as opposed to 2x 24 port</p> </div> </div> <div class="sect2"> -<h3 id="perf.network.multirack"><a class="anchor" href="#perf.network.multirack"></a>111.3. Multiple Racks</h3> +<h3 id="perf.network.multirack"><a class="anchor" href="#perf.network.multirack"></a>114.3. Multiple Racks</h3> <div class="paragraph"> <p>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:</p> </div> @@ -23253,13 +23488,13 @@ An example of this is, creating an 8Gbps port channel from rack A to rack B, usi </div> </div> <div class="sect2"> -<h3 id="perf.network.ints"><a class="anchor" href="#perf.network.ints"></a>111.4. Network Interfaces</h3> +<h3 id="perf.network.ints"><a class="anchor" href="#perf.network.ints"></a>114.4. Network Interfaces</h3> <div class="paragraph"> <p>Are all the network interfaces functioning correctly? Are you sure? See the Troubleshooting Case Study in <a href="#casestudies.slownode">Case Study #1 (Performance Issue On A Single Node)</a>.</p> </div> </div> <div class="sect2"> -<h3 id="perf.network.call_me_maybe"><a class="anchor" href="#perf.network.call_me_maybe"></a>111.5. Network Consistency and Partition Tolerance</h3> +<h3 id="perf.network.call_me_maybe"><a class="anchor" href="#perf.network.call_me_maybe"></a>114.5. Network Consistency and Partition Tolerance</h3> <div class="paragraph"> <p>The <a href="http://en.wikipedia.org/wiki/CAP_theorem">CAP Theorem</a> states that a distributed system can maintain two out of the following three characteristics: - *C*onsistency — all nodes see the same data. @@ -23276,12 +23511,12 @@ An example of this is, creating an 8Gbps port channel from rack A to rack B, usi </div> </div> <div class="sect1"> -<h2 id="jvm"><a class="anchor" href="#jvm"></a>112. Java</h2> +<h2 id="jvm"><a class="anchor" href="#jvm"></a>115. Java</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="gc"><a class="anchor" href="#gc"></a>112.1. The Garbage Collector and Apache HBase</h3> +<h3 id="gc"><a class="anchor" href="#gc"></a>115.1. The Garbage Collector and Apache HBase</h3> <div class="sect3"> -<h4 id="gcpause"><a class="anchor" href="#gcpause"></a>112.1.1. Long GC pauses</h4> +<h4 id="gcpause"><a class="anchor" href="#gcpause"></a>115.1.1. Long GC pauses</h4> <div class="paragraph"> <p>In his presentation, <a href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs with MemStore-Local Allocation Buffers</a>, Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading; CMS failure modes and old generation heap fragmentation brought.</p> </div> @@ -23319,38 +23554,38 @@ See <a href="#block.cache">Block Cache</a></p> </div> </div> <div class="sect1"> -<h2 id="perf.configurations"><a class="anchor" href="#perf.configurations"></a>113. HBase Configurations</h2> +<h2 id="perf.configurations"><a class="anchor" href="#perf.configurations"></a>116. HBase Configurations</h2> <div class="sectionbody"> <div class="paragraph"> <p>See <a href="#recommended_configurations">Recommended Configurations</a>.</p> </div> <div class="sect2"> -<h3 id="perf.99th.percentile"><a class="anchor" href="#perf.99th.percentile"></a>113.1. Improving the 99th Percentile</h3> +<h3 id="perf.99th.percentile"><a class="anchor" href="#perf.99th.percentile"></a>116.1. Improving the 99th Percentile</h3> <div class="paragraph"> <p>Try link:[hedged_reads].</p> </div> </div> <div class="sect2"> -<h3 id="perf.compactions.and.splits"><a class="anchor" href="#perf.compactions.and.splits"></a>113.2. Managing Compactions</h3> +<h3 id="perf.compactions.and.splits"><a class="anchor" href="#perf.compactions.and.splits"></a>116.2. Managing Compactions</h3> <div class="paragraph"> <p>For larger systems, managing link:[compactions and splits] may be something you want to consider.</p> </div> </div> <div class="sect2"> -<h3 id="perf.handlers"><a class="anchor" href="#perf.handlers"></a>113.3. <code>hbase.regionserver.handler.count</code></h3> +<h3 id="perf.handlers"><a class="anchor" href="#perf.handlers"></a>116.3. <code>hbase.regionserver.handler.count</code></h3> <div class="paragraph"> <p>See <a href="#hbase.regionserver.handler.count">[hbase.regionserver.handler.count]</a>.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hfile.block.cache.size"><a class="anchor" href="#perf.hfile.block.cache.size"></a>113.4. <code>hfile.block.cache.size</code></h3> +<h3 id="perf.hfile.block.cache.size"><a class="anchor" href="#perf.hfile.block.cache.size"></a>116.4. <code>hfile.block.cache.size</code></h3> <div class="paragraph"> <p>See <a href="#hfile.block.cache.size">[hfile.block.cache.size]</a>. A memory setting for the RegionServer process.</p> </div> </div> <div class="sect2"> -<h3 id="blockcache.prefetch"><a class="anchor" href="#blockcache.prefetch"></a>113.5. Prefetch Option for Blockcache</h3> +<h3 id="blockcache.prefetch"><a class="anchor" href="#blockcache.prefetch"></a>116.5. Prefetch Option for Blockcache</h3> <div class="paragraph"> <p><a href="https://issues.apache.org/jira/browse/HBASE-9857">HBASE-9857</a> adds a new option to prefetch HFile contents when opening the BlockCache, if a Column family or RegionServer property is set. This option is available for HBase 0.98.3 and later. @@ -23393,35 +23628,35 @@ or on <code>org.apache.hadoop.hbase.io.hfile.HFileReaderV2</code> in earlier ver </div> </div> <div class="sect2"> -<h3 id="perf.rs.memstore.size"><a class="anchor" href="#perf.rs.memstore.size"></a>113.6. <code>hbase.regionserver.global.memstore.size</code></h3> +<h3 id="perf.rs.memstore.size"><a class="anchor" href="#perf.rs.memstore.size"></a>116.6. <code>hbase.regionserver.global.memstore.size</code></h3> <div class="paragraph"> <p>See <a href="#hbase.regionserver.global.memstore.size">[hbase.regionserver.global.memstore.size]</a>. This memory setting is often adjusted for the RegionServer process depending on needs.</p> </div> </div> <div class="sect2"> -<h3 id="perf.rs.memstore.size.lower.limit"><a class="anchor" href="#perf.rs.memstore.size.lower.limit"></a>113.7. <code>hbase.regionserver.global.memstore.size.lower.limit</code></h3> +<h3 id="perf.rs.memstore.size.lower.limit"><a class="anchor" href="#perf.rs.memstore.size.lower.limit"></a>116.7. <code>hbase.regionserver.global.memstore.size.lower.limit</code></h3> <div class="paragraph"> <p>See <a href="#hbase.regionserver.global.memstore.size.lower.limit">[hbase.regionserver.global.memstore.size.lower.limit]</a>. This memory setting is often adjusted for the RegionServer process depending on needs.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hstore.blockingstorefiles"><a class="anchor" href="#perf.hstore.blockingstorefiles"></a>113.8. <code>hbase.hstore.blockingStoreFiles</code></h3> +<h3 id="perf.hstore.blockingstorefiles"><a class="anchor" href="#perf.hstore.blockingstorefiles"></a>116.8. <code>hbase.hstore.blockingStoreFiles</code></h3> <div class="paragraph"> <p>See <a href="#hbase.hstore.blockingStoreFiles">[hbase.hstore.blockingStoreFiles]</a>. If there is blocking in the RegionServer logs, increasing this can help.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hregion.memstore.block.multiplier"><a class="anchor" href="#perf.hregion.memstore.block.multiplier"></a>113.9. <code>hbase.hregion.memstore.block.multiplier</code></h3> +<h3 id="perf.hregion.memstore.block.multiplier"><a class="anchor" href="#perf.hregion.memstore.block.multiplier"></a>116.9. <code>hbase.hregion.memstore.block.multiplier</code></h3> <div class="paragraph"> <p>See <a href="#hbase.hregion.memstore.block.multiplier">[hbase.hregion.memstore.block.multiplier]</a>. If there is enough RAM, increasing this can help.</p> </div> </div> <div class="sect2"> -<h3 id="hbase.regionserver.checksum.verify.performance"><a class="anchor" href="#hbase.regionserver.checksum.verify.performance"></a>113.10. <code>hbase.regionserver.checksum.verify</code></h3> +<h3 id="hbase.regionserver.checksum.verify.performance"><a class="anchor" href="#hbase.regionserver.checksum.verify.performance"></a>116.10. <code>hbase.regionserver.checksum.verify</code></h3> <div class="paragraph"> <p>Have HBase write the checksum into the datablock and save having to do the checksum seek whenever you read.</p> </div> @@ -23430,7 +23665,7 @@ If there is enough RAM, increasing this can help.</p> </div> </div> <div class="sect2"> -<h3 id="_tuning_code_callqueue_code_options"><a class="anchor" href="#_tuning_code_callqueue_code_options"></a>113.11. Tuning <code>callQueue</code> Options</h3> +<h3 id="_tuning_code_callqueue_code_options"><a class="anchor" href="#_tuning_code_callqueue_code_options"></a>116.11. Tuning <code>callQueue</code> Options</h3> <div class="paragraph"> <p><a href="https://issues.apache.org/jira/browse/HBASE-11355">HBASE-11355</a> introduces several callQueue tuning mechanisms which can increase performance. See the JIRA for some benchmarking information.</p> @@ -23524,7 +23759,7 @@ These parameters are intended for testing purposes and should be used carefully. </div> </div> <div class="sect1"> -<h2 id="perf.zookeeper"><a class="anchor" href="#perf.zookeeper"></a>114. ZooKeeper</h2> +<h2 id="perf.zookeeper"><a class="anchor" href="#perf.zookeeper"></a>117. ZooKeeper</h2> <div class="sectionbody"> <div class="paragraph"> <p>See <a href="#zookeeper">ZooKeeper</a> for information on configuring ZooKeeper, and see the part about having a dedicated disk.</p> @@ -23532,23 +23767,23 @@ These parameters are intended for testing purposes and should be used carefully. </div> </div> <div class="sect1"> -<h2 id="perf.schema"><a class="anchor" href="#perf.schema"></a>115. Schema Design</h2> +<h2 id="perf.schema"><a class="anchor" href="#perf.schema"></a>118. Schema Design</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="perf.number.of.cfs"><a class="anchor" href="#perf.number.of.cfs"></a>115.1. Number of Column Families</h3> +<h3 id="perf.number.of.cfs"><a class="anchor" href="#perf.number.of.cfs"></a>118.1. Number of Column Families</h3> <div class="paragraph"> <p>See <a href="#number.of.cfs">On the number of column families</a>.</p> </div> </div> <div class="sect2"> -<h3 id="perf.schema.keys"><a class="anchor" href="#perf.schema.keys"></a>115.2. Key and Attribute Lengths</h3> +<h3 id="perf.schema.keys"><a class="anchor" href="#perf.schema.keys"></a>118.2. Key and Attribute Lengths</h3> <div class="paragraph"> <p>See <a href="#keysize">Try to minimize row and column sizes</a>. See also <a href="#perf.compression.however">However…​</a> for compression caveats.</p> </div> </div> <div class="sect2"> -<h3 id="schema.regionsize"><a class="anchor" href="#schema.regionsize"></a>115.3. Table RegionSize</h3> +<h3 id="schema.regionsize"><a class="anchor" href="#schema.regionsize"></a>118.3. Table RegionSize</h3> <div class="paragraph"> <p>The regionsize can be set on a per-table basis via <code>setFileSize</code> on <a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</a> in the event where certain tables require different regionsizes than the configured default regionsize.</p> </div> @@ -23557,7 +23792,7 @@ See also <a href="#perf.compression.however">However…​</a> for compr </div> </div> <div class="sect2"> -<h3 id="schema.bloom"><a class="anchor" href="#schema.bloom"></a>115.4. Bloom Filters</h3> +<h3 id="schema.bloom"><a class="anchor" href="#schema.bloom"></a>118.4. Bloom Filters</h3> <div class="paragraph"> <p>A Bloom filter, named for its creator, Burton Howard Bloom, is a data structure which is designed to predict whether a given element is a member of a set of data. A positive result from a Bloom filter is not always accurate, but a negative result is guaranteed to be accurate. @@ -23584,7 +23819,7 @@ Since HBase 0.96, row-based Bloom filters are enabled by default. <p>For more information on Bloom filters in relation to HBase, see <a href="#blooms">Bloom Filters</a> for more information, or the following Quora discussion: <a href="http://www.quora.com/How-are-bloom-filters-used-in-HBase">How are bloom filters used in HBase?</a>.</p> </div> <div class="sect3"> -<h4 id="bloom.filters.when"><a class="anchor" href="#bloom.filters.when"></a>115.4.1. When To Use Bloom Filters</h4> +<h4 id="bloom.filters.when"><a class="anchor" href="#bloom.filters.when"></a>118.4.1. When To Use Bloom Filters</h4> <div class="paragraph"> <p>Since HBase 0.96, row-based Bloom filters are enabled by default. You may choose to disable them or to change some tables to use row+column Bloom filters, depending on the characteristics of your data and how it is loaded into HBase.</p> @@ -23609,7 +23844,7 @@ Bloom filters work best when the size of each data entry is at least a few kilob </div> </div> <div class="sect3"> -<h4 id="_enabling_bloom_filters"><a class="anchor" href="#_enabling_bloom_filters"></a>115.4.2. Enabling Bloom Filters</h4> +<h4 id="_enabling_bloom_filters"><a class="anchor" href="#_enabling_bloom_filters"></a>118.4.2. Enabling Bloom Filters</h4> <div class="paragraph"> <p>Bloom filters are enabled on a Column Family. You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API. @@ -23627,7 +23862,7 @@ See also the API documentation for <a href="https://hbase.apache.org/apidocs/org </div> </div> <div class="sect3"> -<h4 id="_configuring_server_wide_behavior_of_bloom_filters"><a class="anchor" href="#_configuring_server_wide_behavior_of_bloom_filters"></a>115.4.3. Configuring Server-Wide Behavior of Bloom Filters</h4> +<h4 id="_configuring_server_wide_behavior_of_bloom_filters"><a class="anchor" href="#_configuring_server_wide_behavior_of_bloom_filters"></a>118.4.3. Configuring Server-Wide Behavior of Bloom Filters</h4> <div class="paragraph"> <p>You can configure the following settings in the <em>hbase-site.xml</em>.</p> </div> @@ -23689,7 +23924,7 @@ See also the API documentation for <a href="https://hbase.apache.org/apidocs/org </div> </div> <div class="sect2"> -<h3 id="schema.cf.blocksize"><a class="anchor" href="#schema.cf.blocksize"></a>115.5. ColumnFamily BlockSize</h3> +<h3 id="schema.cf.blocksize"><a class="anchor" href="#schema.cf.blocksize"></a>118.5. ColumnFamily BlockSize</h3> <div class="paragraph"> <p>The blocksize can be configured for each ColumnFamily in a table, and defaults to 64k. Larger cell values require larger blocksizes. @@ -23700,7 +23935,7 @@ There is an inverse relationship between blocksize and the resulting StoreFile i </div> </div> <div class="sect2"> -<h3 id="cf.in.memory"><a class="anchor" href="#cf.in.memory"></a>115.6. In-Memory ColumnFamilies</h3> +<h3 id="cf.in.memory"><a class="anchor" href="#cf.in.memory"></a>118.6. In-Memory ColumnFamilies</h3> <div class="paragraph"> <p>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. @@ -23711,13 +23946,13 @@ In-memory blocks have the highest priority in the <a href="#block.cache">Block C </div> </div> <div class="sect2"> -<h3 id="perf.compression"><a class="anchor" href="#perf.compression"></a>115.7. Compression</h3> +<h3 id="perf.compression"><a class="anchor" href="#perf.compression"></a>118.7. Compression</h3> <div class="paragraph"> <p>Production systems should use compression with their ColumnFamily definitions. See <a href="#compression">Compression and Data Block Encoding In HBase</a> for more information.</p> </div> <div class="sect3"> -<h4 id="perf.compression.however"><a class="anchor" href="#perf.compression.however"></a>115.7.1. However…​</h4> +<h4 id="perf.compression.however"><a class="anchor" href="#perf.compression.however"></a>118.7.1. However…​</h4> <div class="paragraph"> <p>Compression deflates data <em>on disk</em>. When it’s in-memory (e.g., in the MemStore) or on the wire (e.g., transferring between RegionServer and Client) it’s inflated. @@ -23731,10 +23966,10 @@ So while using ColumnFamily compression is a best practice, but it’s not g </div> </div> <div class="sect1"> -<h2 id="perf.general"><a class="anchor" href="#perf.general"></a>116. HBase General Patterns</h2> +<h2 id="perf.general"><a class="anchor" href="#perf.general"></a>119. HBase General Patterns</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="perf.general.constants"><a class="anchor" href="#perf.general.constants"></a>116.1. Constants</h3> +<h3 id="perf.general.constants"><a class="anchor" href="#perf.general.constants"></a>119.1. Constants</h3> <div class="paragraph"> <p>When people get started with HBase they have a tendency to write code that looks like this:</p> </div> @@ -23763,10 +23998,10 @@ Get get = <span class="keyword">new</span> Get(rowkey); </div> </div> <div class="sect1"> -<h2 id="perf.writing"><a class="anchor" href="#perf.writing"></a>117. Writing to HBase</h2> +<h2 id="perf.writing"><a class="anchor" href="#perf.writing"></a>120. Writing to HBase</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="perf.batch.loading"><a class="anchor" href="#perf.batch.loading"></a>117.1. Batch Loading</h3> +<h3 id="perf.batch.loading"><a class="anchor" href="#perf.batch.loading"></a>120.1. Batch Loading</h3> <div class="paragraph"> <p>Use the bulk load tool if you can. See <a href="#arch.bulk.load">Bulk Loading</a>. @@ -23774,7 +24009,7 @@ Otherwise, pay attention to the below.</p> </div> </div> <div class="sect2"> -<h3 id="precreate.regions"><a class="anchor" href="#precreate.regions"></a>117.2. Table Creation: Pre-Creating Regions</h3> +<h3 id="precreate.regions"><a class="anchor" href="#precreate.regions"></a>120.2. Table Creation: Pre-Creating Regions</h3> <div class="paragraph"> <p>Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. @@ -23824,7 +24059,7 @@ See <a href="#tricks.pre-split">Pre-splitting tables with the HBase Shell</a> fo </div> </div> <div class="sect2"> -<h3 id="def.log.flush"><a class="anchor" href="#def.log.flush"></a>117.3. Table Creation: Deferred Log Flush</h3> +<h3 id="def.log.flush"><a class="anchor" href="#def.log.flush"></a>120.3. Table Creation: Deferred Log Flush</h3> <div class="paragraph"> <p>The default behavior for Puts using the Write Ahead Log (WAL) is that <code>WAL</code> edits will be written immediately. If deferred log flush is used, WAL edits are kept in memory until the flush period. @@ -23837,7 +24072,7 @@ The default value of <code>hbase.regionserver.optionallogflushinterval</code> is </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.putwal"><a class="anchor" href="#perf.hbase.client.putwal"></a>117.4. HBase Client: Turn off WAL on Puts</h3> +<h3 id="perf.hbase.client.putwal"><a class="anchor" href="#perf.hbase.client.putwal"></a>120.4. HBase Client: Turn off WAL on Puts</h3> <div class="paragraph"> <p>A frequent request is to disable the WAL to increase performance of Puts. This is only appropriate for bulk loads, as it puts your data at risk by removing the protection of the WAL in the event of a region server crash. @@ -23862,14 +24097,14 @@ To disable the WAL, see <a href="#wal.disable">Disabling the WAL</a>.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.regiongroup"><a class="anchor" href="#perf.hbase.client.regiongroup"></a>117.5. HBase Client: Group Puts by RegionServer</h3> +<h3 id="perf.hbase.client.regiongroup"><a class="anchor" href="#perf.hbase.client.regiongroup"></a>120.5. HBase Client: Group Puts by RegionServer</h3> <div class="paragraph"> <p>In addition to using the writeBuffer, grouping <code>Put`s by RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a utility `HTableUtil</code> currently on MASTER that does this, but you can either copy that or implement your own version for those still on 0.90.x or earlier.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.write.mr.reducer"><a class="anchor" href="#perf.hbase.write.mr.reducer"></a>117.6. MapReduce: Skip The Reducer</h3> +<h3 id="perf.hbase.write.mr.reducer"><a class="anchor" href="#perf.hbase.write.mr.reducer"></a>120.6. MapReduce: Skip The Reducer</h3> <div class="paragraph"> <p>When writing a lot of data to an HBase table from a MR job (e.g., with <a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</a>), and specifically where Puts are being emitted from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. @@ -23880,7 +24115,7 @@ It’s far more efficient to just write directly to HBase.</p> </div> </div> <div class="sect2"> -<h3 id="perf.one.region"><a class="anchor" href="#perf.one.region"></a>117.7. Anti-Pattern: One Hot Region</h3> +<h3 id="perf.one.region"><a class="anchor" href="#perf.one.region"></a>120.7. Anti-Pattern: One Hot Region</h3> <div class="paragraph"> <p>If all your data is being written to one region at a time, then re-read the section on processing timeseries data.</p> </div> @@ -23896,21 +24131,21 @@ As the HBase client communicates directly with the RegionServers, this can be ob </div> </div> <div class="sect1"> -<h2 id="perf.reading"><a class="anchor" href="#perf.reading"></a>118. Reading from HBase</h2> +<h2 id="perf.reading"><a class="anchor" href="#perf.reading"></a>121. Reading from HBase</h2> <div class="sectionbody"> <div class="paragraph"> <p>The mailing list can help if you are having performance issues. For example, here is a good general thread on what to look at addressing read-time issues: <a href="http://search-hadoop.com/m/qOo2yyHtCC1">HBase Random Read latency > 100ms</a></p> </div> <div class="sect2"> -<h3 id="perf.hbase.client.caching"><a class="anchor" href="#perf.hbase.client.caching"></a>118.1. Scan Caching</h3> +<h3 id="perf.hbase.client.caching"><a class="anchor" href="#perf.hbase.client.caching"></a>121.1. Scan Caching</h3> <div class="paragraph"> <p>If HBase is used as an input source for a MapReduce job, for example, make sure that the input <a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</a> instance to the MapReduce job has <code>setCaching</code> set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed. Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed. There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn’t always better.</p> </div> <div class="sect3"> -<h4 id="perf.hbase.client.caching.mr"><a class="anchor" href="#perf.hbase.client.caching.mr"></a>118.1.1. Scan Caching in MapReduce Jobs</h4> +<h4 id="perf.hbase.client.caching.mr"><a class="anchor" href="#perf.hbase.client.caching.mr"></a>121.1.1. Scan Caching in MapReduce Jobs</h4> <div class="paragraph"> <p>Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException) in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the next set of data. @@ -23924,7 +24159,7 @@ If you process rows more slowly (e.g., lots of transformations per row, writes), </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.selection"><a class="anchor" href="#perf.hbase.client.selection"></a>118.2. Scan Attribute Selection</h3> +<h3 id="perf.hbase.client.selection"><a class="anchor" href="#perf.hbase.client.selection"></a>121.2. Scan Attribute Selection</h3> <div class="paragraph"> <p>Whenever a Scan is used to process large numbers of rows (and especially when used as a MapReduce source), be aware of which attributes are selected. If <code>scan.addFamily</code> is called then <em>all</em> of the attributes in the specified ColumnFamily will be returned to the client. @@ -23932,7 +24167,7 @@ If only a small number of the available attributes are to be processed, then onl </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.seek"><a class="anchor" href="#perf.hbase.client.seek"></a>118.3. Avoid scan seeks</h3> +<h3 id="perf.hbase.client.seek"><a class="anchor" href="#perf.hbase.client.seek"></a>121.3. Avoid scan seeks</h3> <div class="paragraph"> <p>When columns are selected explicitly with <code>scan.addColumn</code>, HBase will schedule seek operations to seek between the selected columns. When rows have few columns and each column has only a few versions this can be inefficient. @@ -23952,13 +24187,13 @@ table.getScanner(scan);</code></pre> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.mr.input"><a class="anchor" href="#perf.hbase.mr.input"></a>118.4. MapReduce - Input Splits</h3> +<h3 id="perf.hbase.mr.input"><a class="anchor" href="#perf.hbase.mr.input"></a>121.4. MapReduce - Input Splits</h3> <div class="paragraph"> <p>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to have the same Input Split (i.e., the RegionServer serving the data), see the Troubleshooting Case Study in <a href="#casestudies.slownode">Case Study #1 (Performance Issue On A Single Node)</a>.</p> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.scannerclose"><a class="anchor" href="#perf.hbase.client.scannerclose"></a>118.5. Close ResultScanners</h3> +<h3 id="perf.hbase.client.scannerclose"><a class="anchor" href="#perf.hbase.client.scannerclose"></a>121.5. Close ResultScanners</h3> <div class="paragraph"> <p>This isn’t so much about improving performance but rather <em>avoiding</em> performance problems. If you forget to close <a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html">ResultScanners</a> you can cause problems on the RegionServers. @@ -23980,7 +24215,7 @@ table.close();</code></pre> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.blockcache"><a class="anchor" href="#perf.hbase.client.blockcache"></a>118.6. Block Cache</h3> +<h3 id="perf.hbase.client.blockcache"><a class="anchor" href="#perf.hbase.client.blockcache"></a>121.6. Block Cache</h3> <div class="paragraph"> <p><a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</a> instances can be set to use the block cache in the RegionServer via the <code>setCacheBlocks</code> method. For input Scans to MapReduce jobs, this should be <code>false</code>. @@ -23992,7 +24227,7 @@ See <a href="#offheap.blockcache">Off-heap Block Cache</a></p> </div> </div> <div class="sect2"> -<h3 id="perf.hbase.client.rowkeyonly"><a class="anchor" href="#perf.hbase.client.rowkeyonly"></a>118.7
<TRUNCATED>
