http://git-wip-us.apache.org/repos/asf/hbase-site/blob/21347dff/book.html ---------------------------------------------------------------------- diff --git a/book.html b/book.html index cbbdddd..233b28d 100644 --- a/book.html +++ b/book.html @@ -138,166 +138,172 @@ <li><a href="#hbase_mob">75. Storing Medium-sized Objects (MOB)</a></li> </ul> </li> +<li><a href="#inmemory_compaction">In-memory Compaction</a> +<ul class="sectlevel1"> +<li><a href="#imc.overview">76. Overview</a></li> +<li><a href="#_enabling">77. Enabling</a></li> +</ul> +</li> <li><a href="#backuprestore">Backup and Restore</a> <ul class="sectlevel1"> -<li><a href="#br.overview">76. Overview</a></li> -<li><a href="#br.terminology">77. Terminology</a></li> -<li><a href="#br.planning">78. Planning</a></li> -<li><a href="#br.initial.setup">79. First-time configuration steps</a></li> -<li><a href="#_backup_and_restore_commands">80. Backup and Restore commands</a></li> -<li><a href="#br.administration">81. Administration of Backup Images</a></li> -<li><a href="#br.backup.configuration">82. Configuration keys</a></li> -<li><a href="#br.best.practices">83. Best Practices</a></li> -<li><a href="#br.s3.backup.scenario">84. Scenario: Safeguarding Application Datasets on Amazon S3</a></li> -<li><a href="#br.data.security">85. Security of Backup Data</a></li> -<li><a href="#br.technical.details">86. Technical Details of Incremental Backup and Restore</a></li> -<li><a href="#br.filesystem.growth.warning">87. A Warning on File System Growth</a></li> -<li><a href="#br.backup.capacity.planning">88. Capacity Planning</a></li> -<li><a href="#br.limitations">89. Limitations of the Backup and Restore Utility</a></li> +<li><a href="#br.overview">78. Overview</a></li> +<li><a href="#br.terminology">79. Terminology</a></li> +<li><a href="#br.planning">80. Planning</a></li> +<li><a href="#br.initial.setup">81. First-time configuration steps</a></li> +<li><a href="#_backup_and_restore_commands">82. Backup and Restore commands</a></li> +<li><a href="#br.administration">83. Administration of Backup Images</a></li> +<li><a href="#br.backup.configuration">84. Configuration keys</a></li> +<li><a href="#br.best.practices">85. Best Practices</a></li> +<li><a href="#br.s3.backup.scenario">86. Scenario: Safeguarding Application Datasets on Amazon S3</a></li> +<li><a href="#br.data.security">87. Security of Backup Data</a></li> +<li><a href="#br.technical.details">88. Technical Details of Incremental Backup and Restore</a></li> +<li><a href="#br.filesystem.growth.warning">89. A Warning on File System Growth</a></li> +<li><a href="#br.backup.capacity.planning">90. Capacity Planning</a></li> +<li><a href="#br.limitations">91. Limitations of the Backup and Restore Utility</a></li> </ul> </li> <li><a href="#hbase_apis">Apache HBase APIs</a> <ul class="sectlevel1"> -<li><a href="#_examples">90. Examples</a></li> +<li><a href="#_examples">92. Examples</a></li> </ul> </li> <li><a href="#external_apis">Apache HBase External APIs</a> <ul class="sectlevel1"> -<li><a href="#_rest">91. REST</a></li> -<li><a href="#_thrift">92. Thrift</a></li> -<li><a href="#c">93. C/C++ Apache HBase Client</a></li> -<li><a href="#jdo">94. Using Java Data Objects (JDO) with HBase</a></li> -<li><a href="#scala">95. Scala</a></li> -<li><a href="#jython">96. Jython</a></li> +<li><a href="#_rest">93. REST</a></li> +<li><a href="#_thrift">94. Thrift</a></li> +<li><a href="#c">95. C/C++ Apache HBase Client</a></li> +<li><a href="#jdo">96. Using Java Data Objects (JDO) with HBase</a></li> +<li><a href="#scala">97. Scala</a></li> +<li><a href="#jython">98. Jython</a></li> </ul> </li> <li><a href="#thrift">Thrift API and Filter Language</a> <ul class="sectlevel1"> -<li><a href="#thrift.filter_language">97. Filter Language</a></li> +<li><a href="#thrift.filter_language">99. Filter Language</a></li> </ul> </li> <li><a href="#spark">HBase and Spark</a> <ul class="sectlevel1"> -<li><a href="#_basic_spark">98. Basic Spark</a></li> -<li><a href="#_spark_streaming">99. Spark Streaming</a></li> -<li><a href="#_bulk_load">100. Bulk Load</a></li> -<li><a href="#_sparksql_dataframes">101. SparkSQL/DataFrames</a></li> +<li><a href="#_basic_spark">100. Basic Spark</a></li> +<li><a href="#_spark_streaming">101. Spark Streaming</a></li> +<li><a href="#_bulk_load">102. Bulk Load</a></li> +<li><a href="#_sparksql_dataframes">103. SparkSQL/DataFrames</a></li> </ul> </li> <li><a href="#cp">Apache HBase Coprocessors</a> <ul class="sectlevel1"> -<li><a href="#_coprocessor_overview">102. Coprocessor Overview</a></li> -<li><a href="#_types_of_coprocessors">103. Types of Coprocessors</a></li> -<li><a href="#cp_loading">104. Loading Coprocessors</a></li> -<li><a href="#cp_example">105. Examples</a></li> -<li><a href="#_guidelines_for_deploying_a_coprocessor">106. Guidelines For Deploying A Coprocessor</a></li> -<li><a href="#_restricting_coprocessor_usage">107. Restricting Coprocessor Usage</a></li> +<li><a href="#_coprocessor_overview">104. Coprocessor Overview</a></li> +<li><a href="#_types_of_coprocessors">105. Types of Coprocessors</a></li> +<li><a href="#cp_loading">106. Loading Coprocessors</a></li> +<li><a href="#cp_example">107. Examples</a></li> +<li><a href="#_guidelines_for_deploying_a_coprocessor">108. Guidelines For Deploying A Coprocessor</a></li> +<li><a href="#_restricting_coprocessor_usage">109. Restricting Coprocessor Usage</a></li> </ul> </li> <li><a href="#performance">Apache HBase Performance Tuning</a> <ul class="sectlevel1"> -<li><a href="#perf.os">108. Operating System</a></li> -<li><a href="#perf.network">109. Network</a></li> -<li><a href="#jvm">110. Java</a></li> -<li><a href="#perf.configurations">111. HBase Configurations</a></li> -<li><a href="#perf.zookeeper">112. ZooKeeper</a></li> -<li><a href="#perf.schema">113. Schema Design</a></li> -<li><a href="#perf.general">114. HBase General Patterns</a></li> -<li><a href="#perf.writing">115. Writing to HBase</a></li> -<li><a href="#perf.reading">116. Reading from HBase</a></li> -<li><a href="#perf.deleting">117. Deleting from HBase</a></li> -<li><a href="#perf.hdfs">118. HDFS</a></li> -<li><a href="#perf.ec2">119. Amazon EC2</a></li> -<li><a href="#perf.hbase.mr.cluster">120. Collocating HBase and MapReduce</a></li> -<li><a href="#perf.casestudy">121. Case Studies</a></li> +<li><a href="#perf.os">110. Operating System</a></li> +<li><a href="#perf.network">111. Network</a></li> +<li><a href="#jvm">112. Java</a></li> +<li><a href="#perf.configurations">113. HBase Configurations</a></li> +<li><a href="#perf.zookeeper">114. ZooKeeper</a></li> +<li><a href="#perf.schema">115. Schema Design</a></li> +<li><a href="#perf.general">116. HBase General Patterns</a></li> +<li><a href="#perf.writing">117. Writing to HBase</a></li> +<li><a href="#perf.reading">118. Reading from HBase</a></li> +<li><a href="#perf.deleting">119. Deleting from HBase</a></li> +<li><a href="#perf.hdfs">120. HDFS</a></li> +<li><a href="#perf.ec2">121. Amazon EC2</a></li> +<li><a href="#perf.hbase.mr.cluster">122. Collocating HBase and MapReduce</a></li> +<li><a href="#perf.casestudy">123. Case Studies</a></li> </ul> </li> <li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#trouble.general">122. General Guidelines</a></li> -<li><a href="#trouble.log">123. Logs</a></li> -<li><a href="#trouble.resources">124. Resources</a></li> -<li><a href="#trouble.tools">125. Tools</a></li> -<li><a href="#trouble.client">126. Client</a></li> -<li><a href="#trouble.mapreduce">127. MapReduce</a></li> -<li><a href="#trouble.namenode">128. NameNode</a></li> -<li><a href="#trouble.network">129. Network</a></li> -<li><a href="#trouble.rs">130. RegionServer</a></li> -<li><a href="#trouble.master">131. Master</a></li> -<li><a href="#trouble.zookeeper">132. ZooKeeper</a></li> -<li><a href="#trouble.ec2">133. Amazon EC2</a></li> -<li><a href="#trouble.versions">134. HBase and Hadoop version issues</a></li> -<li><a href="#_hbase_and_hdfs">135. HBase and HDFS</a></li> -<li><a href="#trouble.tests">136. Running unit or integration tests</a></li> -<li><a href="#trouble.casestudy">137. Case Studies</a></li> -<li><a href="#trouble.crypto">138. Cryptographic Features</a></li> -<li><a href="#_operating_system_specific_issues">139. Operating System Specific Issues</a></li> -<li><a href="#_jdk_issues">140. JDK Issues</a></li> +<li><a href="#trouble.general">124. General Guidelines</a></li> +<li><a href="#trouble.log">125. Logs</a></li> +<li><a href="#trouble.resources">126. Resources</a></li> +<li><a href="#trouble.tools">127. Tools</a></li> +<li><a href="#trouble.client">128. Client</a></li> +<li><a href="#trouble.mapreduce">129. MapReduce</a></li> +<li><a href="#trouble.namenode">130. NameNode</a></li> +<li><a href="#trouble.network">131. Network</a></li> +<li><a href="#trouble.rs">132. RegionServer</a></li> +<li><a href="#trouble.master">133. Master</a></li> +<li><a href="#trouble.zookeeper">134. ZooKeeper</a></li> +<li><a href="#trouble.ec2">135. Amazon EC2</a></li> +<li><a href="#trouble.versions">136. HBase and Hadoop version issues</a></li> +<li><a href="#_hbase_and_hdfs">137. HBase and HDFS</a></li> +<li><a href="#trouble.tests">138. Running unit or integration tests</a></li> +<li><a href="#trouble.casestudy">139. Case Studies</a></li> +<li><a href="#trouble.crypto">140. Cryptographic Features</a></li> +<li><a href="#_operating_system_specific_issues">141. Operating System Specific Issues</a></li> +<li><a href="#_jdk_issues">142. JDK Issues</a></li> </ul> </li> <li><a href="#casestudies">Apache HBase Case Studies</a> <ul class="sectlevel1"> -<li><a href="#casestudies.overview">141. Overview</a></li> -<li><a href="#casestudies.schema">142. Schema Design</a></li> -<li><a href="#casestudies.perftroub">143. Performance/Troubleshooting</a></li> +<li><a href="#casestudies.overview">143. Overview</a></li> +<li><a href="#casestudies.schema">144. Schema Design</a></li> +<li><a href="#casestudies.perftroub">145. Performance/Troubleshooting</a></li> </ul> </li> <li><a href="#ops_mgt">Apache HBase Operational Management</a> <ul class="sectlevel1"> -<li><a href="#tools">144. HBase Tools and Utilities</a></li> -<li><a href="#ops.regionmgt">145. Region Management</a></li> -<li><a href="#node.management">146. Node Management</a></li> -<li><a href="#hbase_metrics">147. HBase Metrics</a></li> -<li><a href="#ops.monitoring">148. HBase Monitoring</a></li> -<li><a href="#_cluster_replication">149. Cluster Replication</a></li> -<li><a href="#_running_multiple_workloads_on_a_single_cluster">150. Running Multiple Workloads On a Single Cluster</a></li> -<li><a href="#ops.backup">151. HBase Backup</a></li> -<li><a href="#ops.snapshots">152. HBase Snapshots</a></li> -<li><a href="#snapshots_azure">153. Storing Snapshots in Microsoft Azure Blob Storage</a></li> -<li><a href="#ops.capacity">154. Capacity Planning and Region Sizing</a></li> -<li><a href="#table.rename">155. Table Rename</a></li> -<li><a href="#rsgroup">156. RegionServer Grouping</a></li> -<li><a href="#normalizer">157. Region Normalizer</a></li> +<li><a href="#tools">146. HBase Tools and Utilities</a></li> +<li><a href="#ops.regionmgt">147. Region Management</a></li> +<li><a href="#node.management">148. Node Management</a></li> +<li><a href="#hbase_metrics">149. HBase Metrics</a></li> +<li><a href="#ops.monitoring">150. HBase Monitoring</a></li> +<li><a href="#_cluster_replication">151. Cluster Replication</a></li> +<li><a href="#_running_multiple_workloads_on_a_single_cluster">152. Running Multiple Workloads On a Single Cluster</a></li> +<li><a href="#ops.backup">153. HBase Backup</a></li> +<li><a href="#ops.snapshots">154. HBase Snapshots</a></li> +<li><a href="#snapshots_azure">155. Storing Snapshots in Microsoft Azure Blob Storage</a></li> +<li><a href="#ops.capacity">156. Capacity Planning and Region Sizing</a></li> +<li><a href="#table.rename">157. Table Rename</a></li> +<li><a href="#rsgroup">158. RegionServer Grouping</a></li> +<li><a href="#normalizer">159. Region Normalizer</a></li> </ul> </li> <li><a href="#developer">Building and Developing Apache HBase</a> <ul class="sectlevel1"> -<li><a href="#getting.involved">158. Getting Involved</a></li> -<li><a href="#repos">159. Apache HBase Repositories</a></li> -<li><a href="#_ides">160. IDEs</a></li> -<li><a href="#build">161. Building Apache HBase</a></li> -<li><a href="#releasing">162. Releasing Apache HBase</a></li> -<li><a href="#hbase.rc.voting">163. Voting on Release Candidates</a></li> -<li><a href="#documentation">164. Generating the HBase Reference Guide</a></li> -<li><a href="#hbase.org">165. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> -<li><a href="#hbase.tests">166. Tests</a></li> -<li><a href="#developing">167. Developer Guidelines</a></li> +<li><a href="#getting.involved">160. Getting Involved</a></li> +<li><a href="#repos">161. Apache HBase Repositories</a></li> +<li><a href="#_ides">162. IDEs</a></li> +<li><a href="#build">163. Building Apache HBase</a></li> +<li><a href="#releasing">164. Releasing Apache HBase</a></li> +<li><a href="#hbase.rc.voting">165. Voting on Release Candidates</a></li> +<li><a href="#documentation">166. Generating the HBase Reference Guide</a></li> +<li><a href="#hbase.org">167. Updating <a href="https://hbase.apache.org">hbase.apache.org</a></a></li> +<li><a href="#hbase.tests">168. Tests</a></li> +<li><a href="#developing">169. Developer Guidelines</a></li> </ul> </li> <li><a href="#unit.tests">Unit Testing HBase Applications</a> <ul class="sectlevel1"> -<li><a href="#_junit">168. JUnit</a></li> -<li><a href="#mockito">169. Mockito</a></li> -<li><a href="#_mrunit">170. MRUnit</a></li> -<li><a href="#_integration_testing_with_an_hbase_mini_cluster">171. Integration Testing with an HBase Mini-Cluster</a></li> +<li><a href="#_junit">170. JUnit</a></li> +<li><a href="#mockito">171. Mockito</a></li> +<li><a href="#_mrunit">172. MRUnit</a></li> +<li><a href="#_integration_testing_with_an_hbase_mini_cluster">173. Integration Testing with an HBase Mini-Cluster</a></li> </ul> </li> <li><a href="#protobuf">Protobuf in HBase</a> <ul class="sectlevel1"> -<li><a href="#_protobuf">172. Protobuf</a></li> +<li><a href="#_protobuf">174. Protobuf</a></li> </ul> </li> <li><a href="#zookeeper">ZooKeeper</a> <ul class="sectlevel1"> -<li><a href="#_using_existing_zookeeper_ensemble">173. Using existing ZooKeeper ensemble</a></li> -<li><a href="#zk.sasl.auth">174. SASL Authentication with ZooKeeper</a></li> +<li><a href="#_using_existing_zookeeper_ensemble">175. Using existing ZooKeeper ensemble</a></li> +<li><a href="#zk.sasl.auth">176. SASL Authentication with ZooKeeper</a></li> </ul> </li> <li><a href="#community">Community</a> <ul class="sectlevel1"> -<li><a href="#_decisions">175. Decisions</a></li> -<li><a href="#community.roles">176. Community Roles</a></li> -<li><a href="#hbase.commit.msg.format">177. Commit Message format</a></li> +<li><a href="#_decisions">177. Decisions</a></li> +<li><a href="#community.roles">178. Community Roles</a></li> +<li><a href="#hbase.commit.msg.format">179. Commit Message format</a></li> </ul> </li> <li><a href="#_appendix">Appendix</a> @@ -315,8 +321,8 @@ <li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li> <li><a href="#orca">Appendix L: Apache HBase Orca</a></li> <li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li> -<li><a href="#tracing.client.modifications">178. Client Modifications</a></li> -<li><a href="#tracing.client.shell">179. Tracing from HBase Shell</a></li> +<li><a href="#tracing.client.modifications">180. Client Modifications</a></li> +<li><a href="#tracing.client.shell">181. Tracing from HBase Shell</a></li> <li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li> </ul> </li> @@ -17723,9 +17729,132 @@ hbase> major_compact 't1', 'c1â, âMOBâ</pre> </div> </div> </div> +<h1 id="inmemory_compaction" class="sect0"><a class="anchor" href="#inmemory_compaction"></a>In-memory Compaction</h1> +<div class="sect1"> +<h2 id="imc.overview"><a class="anchor" href="#imc.overview"></a>76. Overview</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>In-memory Compaction (A.K.A Accordion) is a new feature in hbase-2.0.0. +It was first introduced on the Apache HBase Blog at +<a href="https://blogs.apache.org/hbase/entry/accordion-hbase-breathes-with-in">Accordion: HBase Breathes with In-Memory Compaction</a>. +Quoting the blog:</p> +</div> +<div class="quoteblock"> +<blockquote> +<div class="paragraph"> +<p>Accordion reapplies the LSM principal [<em>Log-Structured-Merge Tree</em>, the design pattern upon which HBase is based] to MemStore, in order to eliminate redundancies and other overhead while the data is still in RAM. Doing so decreases the frequency of flushes to HDFS, thereby reducing the write amplification and the overall disk footprint. With less flushes, the write operations are stalled less frequently as the MemStore overflows, therefore the write performance is improved. Less data on disk also implies less pressure on the block cache, higher hit rates, and eventually better read response times. Finally, having less disk writes also means having less compaction happening in the background, i.e., less cycles are stolen from productive (read and write) work. All in all, the effect of in-memory compaction can be envisioned as a catalyst that enables the system move faster as a whole.</p> +</div> +</blockquote> +</div> +<div class="paragraph"> +<p>A developer view is available at +<a href="https://blogs.apache.org/hbase/entry/accordion-developer-view-of-in">Accordion: Developer View of In-Memory Compaction</a>.</p> +</div> +<div class="paragraph"> +<p>In-memory compaction works best when high data churn; overwrites or over-versions +can be eliminated while the data is still in memory. If the writes are all uniques, +it may drag write throughput (In-memory compaction costs CPU). We suggest you test +and compare before deploying to production.</p> +</div> +<div class="paragraph"> +<p>In this section we describe how to enable Accordion and the available configurations.</p> +</div> +</div> +</div> +<div class="sect1"> +<h2 id="_enabling"><a class="anchor" href="#_enabling"></a>77. Enabling</h2> +<div class="sectionbody"> +<div class="paragraph"> +<p>To enable in-memory compactions, set the <em>IN_MEMORY_COMPACTION</em> attribute +on per column family where you want the behavior. The <em>IN_MEMORY_COMPACTION</em> +attribute can have one of three values.</p> +</div> +<div class="ulist"> +<ul> +<li> +<p><em>NONE</em>: No in-memory compaction.</p> +</li> +<li> +<p><em>BASIC</em>: Basic policy enables flushing and keeps a pipeline of flushes until we trip the pipeline maximum threshold and then we flush to disk. No in-memory compaction but can help throughput as data is moved from the profligate, native ConcurrentSkipListMap data-type to more compact (and efficient) data types.</p> +</li> +<li> +<p><em>EAGER</em>: This is <em>BASIC</em> policy plus in-memory compaction of flushes (much like the on-disk compactions done to hfiles); on compaction we apply on-disk rules eliminating versions, duplicates, ttl’d cells, etc.</p> +</li> +<li> +<p><em>ADAPTIVE</em>: Adaptive compaction adapts to the workload. It applies either index compaction or data compaction based on the ratio of duplicate cells in the data. Experimental.</p> +</li> +</ul> +</div> +<div class="paragraph"> +<p>To enable <em>BASIC</em> on the <em>info</em> column family in the table <em>radish</em>, disable the table and add the attribute to the <em>info</em> column family, and then reenable:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre class="CodeRay highlight"><code data-lang="ruby">hbase(main):<span class="integer">002</span>:<span class="integer">0</span>> disable <span class="string"><span class="delimiter">'</span><span class="content">radish</span><span class="delimiter">'</span></span> +<span class="constant">Took</span> <span class="float">0.5570</span> seconds +hbase(main):<span class="integer">003</span>:<span class="integer">0</span>> alter <span class="string"><span class="delimiter">'</span><span class="content">radish</span><span class="delimiter">'</span></span>, {<span class="constant">NAME</span> => <span class="string"><span class="delimiter">'</span><span class="content">info</span><span class="delimiter">'</span></span>, <span class="constant">IN_MEMORY_COMPACTION</span> => <span class="string"><span class="delimiter">'</span><span class="content">BASIC</span><span class="delimiter">'</span></span>} +<span class="constant">Updating</span> all regions with the new schema... +<span class="constant">All</span> regions updated. +Done. +Took <span class="float">1.2413</span> seconds +hbase(main):<span class="integer">004</span>:<span class="integer">0</span>> describe <span class="string"><span class="delimiter">'</span><span class="content">radish</span><span class="delimiter">'</span></span> +<span class="constant">Table</span> radish is <span class="constant">DISABLED</span> +radish +<span class="constant">COLUMN</span> <span class="constant">FAMILIES</span> <span class="constant">DESCRIPTION</span> +{<span class="constant">NAME</span> => <span class="string"><span class="delimiter">'</span><span class="content">info</span><span class="delimiter">'</span></span>, <span class="constant">VERSIONS</span> => <span class="string"><span class="delimiter">'</span><span class="content">1</span><span class="delimiter">'</span></span>, <span class="constant">EVICT_BLOCKS_ON_CLOSE</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">NEW_VERSION_BEHAVIOR</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">KEEP_DELETED_CELLS</span> => <span class="string"><span class="delimiter">'</span><span class="content">FALSE</span><span class="delimiter">'</span></span>, <span class="constant">CACHE_DATA_ON_WRITE</span> => <span class="string"><span class="delimiter">'</s pan><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">DATA_BLOCK_ENCODING</span> => <span class="string"><span class="delimiter">'</span><span class="content">NONE</span><span class="delimiter">'</span></span>, <span class="constant">TTL</span> => <span class="string"><span class="delimiter">'</span><span class="content">FOREVER</span><span class="delimiter">'</span></span>, <span class="constant">MIN_VERSIONS</span> => <span class="string"><span class="delimiter">'</span><span class="content">0</span><span class="delimiter">'</span></span>, <span class="constant">REPLICATION_SCOPE</span> => <span class="string"><span class="delimiter">'</span><span class="content">0</span><span class="delimiter">'</span></span>, <span class="constant">BLOOMFILTER</span> => <span class="string"><span class="delimiter">'</span><span class="content">ROW</span><span class="delimiter">'</span></span>, <span class="constant">CACHE_INDEX_ON_WR ITE</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">IN_MEMORY</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">CACHE_BLOOMS_ON_WRITE</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">PREFETCH_BLOCKS_ON_OPEN</span> => <span class="string"><span class="delimiter">'</span><span class="content">false</span><span class="delimiter">'</span></span>, <span class="constant">COMPRESSION</span> => <span class="string"><span class="delimiter">'</span><span class="content">NONE</span><span class="delimiter">'</span></span>, <span class="constant">BLOCKCACHE</span> => <span class="string"><span class="delimiter">'</span><span class="content">true</s pan><span class="delimiter">'</span></span>, <span class="constant">BLOCKSIZE</span> => <span class="string"><span class="delimiter">'</span><span class="content">65536</span><span class="delimiter">'</span></span>, <span class="constant">METADATA</span> => { +<span class="string"><span class="delimiter">'</span><span class="content">IN_MEMORY_COMPACTION</span><span class="delimiter">'</span></span> => <span class="string"><span class="delimiter">'</span><span class="content">BASIC</span><span class="delimiter">'</span></span>}} +<span class="integer">1</span> row(s) +<span class="constant">Took</span> <span class="float">0.0239</span> seconds +hbase(main):<span class="integer">005</span>:<span class="integer">0</span>> enable <span class="string"><span class="delimiter">'</span><span class="content">radish</span><span class="delimiter">'</span></span> +<span class="constant">Took</span> <span class="float">0.7537</span> seconds</code></pre> +</div> +</div> +<div class="paragraph"> +<p>Note how the IN_MEMORY_COMPACTION attribute shows as part of the <em>METADATA</em> map.</p> +</div> +<div class="paragraph"> +<p>There is also a global configuration, <em>hbase.hregion.compacting.memstore.type</em> which you can set in your <em>hbase-site.xml</em> file. Use it to set the +default on creation of a new table (On creation of a column family Store, we look first to the column family configuration looking for the +<em>IN_MEMORY_COMPACTION</em> setting, and if none, we then consult the <em>hbase.hregion.compacting.memstore.type</em> value using its content; default is +<em>BASIC</em>).</p> +</div> +<div class="paragraph"> +<p>By default, new hbase system tables will have <em>BASIC</em> in-memory compaction set. To specify otherwise, +on new table-creation, set <em>hbase.hregion.compacting.memstore.type</em> to <em>NONE</em> (Note, setting this value +post-creation of system tables will not have a retroactive effect; you will have to alter your tables +to set the in-memory attribute to <em>NONE</em>).</p> +</div> +<div class="paragraph"> +<p>When an in-memory flush happens is calculated by dividing the configured region flush size (Set in the table descriptor +or read from <em>hbase.hregion.memstore.flush.size</em>) by the number of column families and then multiplying by +<em>hbase.memstore.inmemoryflush.threshold.factor</em> (default is 0.1).</p> +</div> +<div class="paragraph"> +<p>The number of flushes carried by the pipeline is monitored so as to fit within the bounds of memstore sizing +but you can also set a maximum on the number of flushes total by setting +<em>hbase.hregion.compacting.pipeline.segments.limit</em>. Default is 4.</p> +</div> +<div class="paragraph"> +<p>When a column family Store is created, it says what memstore type is in effect. As of this writing +there is the old-school <em>DefaultMemStore</em> which fills a <em>ConcurrentSkipListMap</em> and then flushes +to disk or the new <em>CompactingMemStore</em> that is the implementation that provides this new +in-memory compactions facility. Here is a log-line from a RegionServer that shows a column +family Store named <em>family</em> configured to use a <em>CompactingMemStore</em>:</p> +</div> +<div class="listingblock"> +<div class="content"> +<pre>Note how the IN_MEMORY_COMPACTION attribute shows as part of the _METADATA_ map. +2018-03-30 11:02:24,466 INFO [Time-limited test] regionserver.HStore(325): Store=family, memstore type=CompactingMemStore, storagePolicy=HOT, verifyBulkLoads=false, parallelPutCountPrintThreshold=10</pre> +</div> +</div> +<div class="paragraph"> +<p>Enable TRACE-level logging on the CompactingMemStore class (<em>org.apache.hadoop.hbase.regionserver.CompactingMemStore</em>) to see detail on its operation.</p> +</div> +</div> +</div> <h1 id="backuprestore" class="sect0"><a class="anchor" href="#backuprestore"></a>Backup and Restore</h1> <div class="sect1"> -<h2 id="br.overview"><a class="anchor" href="#br.overview"></a>76. Overview</h2> +<h2 id="br.overview"><a class="anchor" href="#br.overview"></a>78. Overview</h2> <div class="sectionbody"> <div class="paragraph"> <p>Backup and restore is a standard operation provided by many databases. An effective backup and restore @@ -17753,7 +17882,7 @@ backup implementation is the novel improvement over the previous "art" provided </div> </div> <div class="sect1"> -<h2 id="br.terminology"><a class="anchor" href="#br.terminology"></a>77. Terminology</h2> +<h2 id="br.terminology"><a class="anchor" href="#br.terminology"></a>79. Terminology</h2> <div class="sectionbody"> <div class="paragraph"> <p>The backup and restore feature introduces new terminology which can be used to understand how control flows through the @@ -17781,7 +17910,7 @@ system.</p> </div> </div> <div class="sect1"> -<h2 id="br.planning"><a class="anchor" href="#br.planning"></a>78. Planning</h2> +<h2 id="br.planning"><a class="anchor" href="#br.planning"></a>80. Planning</h2> <div class="sectionbody"> <div class="paragraph"> <p>There are some common strategies which can be used to implement backup and restore in your environment. The following section @@ -17801,7 +17930,7 @@ This is related to the open issue <a href="https://issues.apache.org/jira/browse </table> </div> <div class="sect2"> -<h3 id="br.intracluster.backup"><a class="anchor" href="#br.intracluster.backup"></a>78.1. Backup within a cluster</h3> +<h3 id="br.intracluster.backup"><a class="anchor" href="#br.intracluster.backup"></a>80.1. Backup within a cluster</h3> <div class="paragraph"> <p>This strategy stores the backups on the same cluster as where the backup was taken. This approach is only appropriate for testing as it does not provide any additional safety on top of what the software itself already provides.</p> @@ -17814,7 +17943,7 @@ as it does not provide any additional safety on top of what the software itself </div> </div> <div class="sect2"> -<h3 id="br.dedicated.cluster.backup"><a class="anchor" href="#br.dedicated.cluster.backup"></a>78.2. Backup using a dedicated cluster</h3> +<h3 id="br.dedicated.cluster.backup"><a class="anchor" href="#br.dedicated.cluster.backup"></a>80.2. Backup using a dedicated cluster</h3> <div class="paragraph"> <p>This strategy provides greater fault tolerance and provides a path towards disaster recovery. In this setting, you will store the backup on a separate HDFS cluster by supplying the backup destination clusterâs HDFS URL to the backup utility. @@ -17831,7 +17960,7 @@ You should consider backing up to a different physical location, such as a diffe </div> </div> <div class="sect2"> -<h3 id="br.cloud.or.vendor.backup"><a class="anchor" href="#br.cloud.or.vendor.backup"></a>78.3. Backup to the Cloud or a storage vendor appliance</h3> +<h3 id="br.cloud.or.vendor.backup"><a class="anchor" href="#br.cloud.or.vendor.backup"></a>80.3. Backup to the Cloud or a storage vendor appliance</h3> <div class="paragraph"> <p>Another approach to safeguarding HBase incremental backups is to store the data on provisioned, secure servers that belong to third-party vendors and that are located off-site. The vendor can be a public cloud provider or a storage vendor who uses @@ -17860,7 +17989,7 @@ of the backup files from HDFS or S3. </div> </div> <div class="sect1"> -<h2 id="br.initial.setup"><a class="anchor" href="#br.initial.setup"></a>79. First-time configuration steps</h2> +<h2 id="br.initial.setup"><a class="anchor" href="#br.initial.setup"></a>81. First-time configuration steps</h2> <div class="sectionbody"> <div class="paragraph"> <p>This section contains the necessary configuration changes that must be made in order to use the backup and restore feature. @@ -17868,7 +17997,7 @@ As this feature makes significant use of YARN’s MapReduce framework to par changes extend outside of just <code>hbase-site.xml</code>.</p> </div> <div class="sect2"> -<h3 id="_allow_the_hbase_system_user_in_yarn"><a class="anchor" href="#_allow_the_hbase_system_user_in_yarn"></a>79.1. Allow the "hbase" system user in YARN</h3> +<h3 id="_allow_the_hbase_system_user_in_yarn"><a class="anchor" href="#_allow_the_hbase_system_user_in_yarn"></a>81.1. Allow the "hbase" system user in YARN</h3> <div class="paragraph"> <p>The YARN <strong>container-executor.cfg</strong> configuration file must have the following property setting: <em>allowed.system.users=hbase</em>. No spaces are allowed in entries of this configuration file.</p> @@ -17899,7 +18028,7 @@ min.user.id=<span class="integer">500</span></code></pre> </div> </div> <div class="sect2"> -<h3 id="_hbase_specific_changes"><a class="anchor" href="#_hbase_specific_changes"></a>79.2. HBase specific changes</h3> +<h3 id="_hbase_specific_changes"><a class="anchor" href="#_hbase_specific_changes"></a>81.2. HBase specific changes</h3> <div class="paragraph"> <p>Add the following properties to hbase-site.xml and restart HBase if it is already running.</p> </div> @@ -17947,7 +18076,7 @@ The ",…​" is an ellipsis meant to imply that this is a comma-separat </div> </div> <div class="sect1"> -<h2 id="_backup_and_restore_commands"><a class="anchor" href="#_backup_and_restore_commands"></a>80. Backup and Restore commands</h2> +<h2 id="_backup_and_restore_commands"><a class="anchor" href="#_backup_and_restore_commands"></a>82. Backup and Restore commands</h2> <div class="sectionbody"> <div class="paragraph"> <p>This covers the command-line utilities that administrators would run to create, restore, and merge backups. Tools to @@ -17958,7 +18087,7 @@ inspect details on specific backup sessions is covered in the next section, <a h and its options. The below information is captured in this help message for each command.</p> </div> <div class="sect2"> -<h3 id="br.creating.complete.backup"><a class="anchor" href="#br.creating.complete.backup"></a>80.1. Creating a Backup Image</h3> +<h3 id="br.creating.complete.backup"><a class="anchor" href="#br.creating.complete.backup"></a>82.1. Creating a Backup Image</h3> <div class="admonitionblock note"> <table> <tr> @@ -18007,7 +18136,7 @@ dataset with a restore operation, having the backup ID readily available can sav </table> </div> <div class="sect3"> -<h4 id="br.create.positional.cli.arguments"><a class="anchor" href="#br.create.positional.cli.arguments"></a>80.1.1. Positional Command-Line Arguments</h4> +<h4 id="br.create.positional.cli.arguments"><a class="anchor" href="#br.create.positional.cli.arguments"></a>82.1.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>type</em></dt> @@ -18024,7 +18153,7 @@ are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p> </div> </div> <div class="sect3"> -<h4 id="br.create.named.cli.arguments"><a class="anchor" href="#br.create.named.cli.arguments"></a>80.1.2. Named Command-Line Arguments</h4> +<h4 id="br.create.named.cli.arguments"><a class="anchor" href="#br.create.named.cli.arguments"></a>82.1.2. Named Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>-t <table_name[,table_name]></em></dt> @@ -18061,7 +18190,7 @@ is useful to prevent backup tasks from stealing resources away from other MapRed </div> </div> <div class="sect3"> -<h4 id="br.usage.examples"><a class="anchor" href="#br.usage.examples"></a>80.1.3. Example usage</h4> +<h4 id="br.usage.examples"><a class="anchor" href="#br.usage.examples"></a>82.1.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup create full hdfs:<span class="comment">//host5:8020/data/backup -t SALES2,SALES3 -w 3</span></code></pre> @@ -18074,7 +18203,7 @@ in the path <em>/data/backup</em>. The <em>-w</em> option specifies that no more </div> </div> <div class="sect2"> -<h3 id="br.restoring.backup"><a class="anchor" href="#br.restoring.backup"></a>80.2. Restoring a Backup Image</h3> +<h3 id="br.restoring.backup"><a class="anchor" href="#br.restoring.backup"></a>82.2. Restoring a Backup Image</h3> <div class="paragraph"> <p>Run the following command as an HBase superuser. You can only restore a backup on a running HBase cluster because the data must be redistributed the RegionServers for the operation to complete successfully.</p> @@ -18085,7 +18214,7 @@ redistributed the RegionServers for the operation to complete successfully.</p> </div> </div> <div class="sect3"> -<h4 id="br.restore.positional.args"><a class="anchor" href="#br.restore.positional.args"></a>80.2.1. Positional Command-Line Arguments</h4> +<h4 id="br.restore.positional.args"><a class="anchor" href="#br.restore.positional.args"></a>82.2.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_path</em></dt> @@ -18101,7 +18230,7 @@ are <em>hdfs:</em>, <em>webhdfs:</em>, <em>gpfs:</em>, and <em>s3fs:</em>.</p> </div> </div> <div class="sect3"> -<h4 id="br.restore.named.args"><a class="anchor" href="#br.restore.named.args"></a>80.2.2. Named Command-Line Arguments</h4> +<h4 id="br.restore.named.args"><a class="anchor" href="#br.restore.named.args"></a>82.2.2. Named Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>-t <table_name[,table_name]></em></dt> @@ -18137,7 +18266,7 @@ this option is provided, there must be an equal number of entries provided in th </div> </div> <div class="sect3"> -<h4 id="br.restore.usage"><a class="anchor" href="#br.restore.usage"></a>80.2.3. Example of Usage</h4> +<h4 id="br.restore.usage"><a class="anchor" href="#br.restore.usage"></a>82.2.3. Example of Usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java">hbase backup restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2</code></pre> @@ -18152,7 +18281,7 @@ this option is provided, there must be an equal number of entries provided in th </div> </div> <div class="sect2"> -<h3 id="br.merge.backup"><a class="anchor" href="#br.merge.backup"></a>80.3. Merging Incremental Backup Images</h3> +<h3 id="br.merge.backup"><a class="anchor" href="#br.merge.backup"></a>82.3. Merging Incremental Backup Images</h3> <div class="paragraph"> <p>This command can be used to merge two or more incremental backup images into a single incremental backup image. This can be used to consolidate multiple, small incremental backup images into a single @@ -18165,7 +18294,7 @@ into a daily incremental backup image, or daily incremental backups into a weekl </div> </div> <div class="sect3"> -<h4 id="br.merge.backup.positional.cli.arguments"><a class="anchor" href="#br.merge.backup.positional.cli.arguments"></a>80.3.1. Positional Command-Line Arguments</h4> +<h4 id="br.merge.backup.positional.cli.arguments"><a class="anchor" href="#br.merge.backup.positional.cli.arguments"></a>82.3.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_ids</em></dt> @@ -18176,13 +18305,13 @@ into a daily incremental backup image, or daily incremental backups into a weekl </div> </div> <div class="sect3"> -<h4 id="br.merge.backup.named.cli.arguments"><a class="anchor" href="#br.merge.backup.named.cli.arguments"></a>80.3.2. Named Command-Line Arguments</h4> +<h4 id="br.merge.backup.named.cli.arguments"><a class="anchor" href="#br.merge.backup.named.cli.arguments"></a>82.3.2. Named Command-Line Arguments</h4> <div class="paragraph"> <p>None.</p> </div> </div> <div class="sect3"> -<h4 id="br.merge.backup.example"><a class="anchor" href="#br.merge.backup.example"></a>80.3.3. Example usage</h4> +<h4 id="br.merge.backup.example"><a class="anchor" href="#br.merge.backup.example"></a>82.3.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup merge backupId_1467823988425,backupId_1467827588425</code></pre> @@ -18191,7 +18320,7 @@ into a daily incremental backup image, or daily incremental backups into a weekl </div> </div> <div class="sect2"> -<h3 id="br.using.backup.sets"><a class="anchor" href="#br.using.backup.sets"></a>80.4. Using Backup Sets</h3> +<h3 id="br.using.backup.sets"><a class="anchor" href="#br.using.backup.sets"></a>82.4. Using Backup Sets</h3> <div class="paragraph"> <p>Backup sets can ease the administration of HBase data backups and restores by reducing the amount of repetitive input of table names. You can group tables into a named backup set with the <code>hbase backup set add</code> command. You can then use @@ -18242,7 +18371,7 @@ backup set metadata, then you must specify individual table names to restore the </div> </div> <div class="sect3"> -<h4 id="br.set.subcommands"><a class="anchor" href="#br.set.subcommands"></a>80.4.1. Backup Set Subcommands</h4> +<h4 id="br.set.subcommands"><a class="anchor" href="#br.set.subcommands"></a>82.4.1. Backup Set Subcommands</h4> <div class="paragraph"> <p>The following list details subcommands of the hbase backup set command.</p> </div> @@ -18287,7 +18416,7 @@ a valid value for the <em>backup_set_name</em> value.</p> </div> </div> <div class="sect3"> -<h4 id="br.set.positional.cli.arguments"><a class="anchor" href="#br.set.positional.cli.arguments"></a>80.4.2. Positional Command-Line Arguments</h4> +<h4 id="br.set.positional.cli.arguments"><a class="anchor" href="#br.set.positional.cli.arguments"></a>82.4.2. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_set_name</em></dt> @@ -18316,7 +18445,7 @@ or remote cluster, backup strategy. This information can help you in case of fai </div> </div> <div class="sect3"> -<h4 id="br.set.usage"><a class="anchor" href="#br.set.usage"></a>80.4.3. Example of Usage</h4> +<h4 id="br.set.usage"><a class="anchor" href="#br.set.usage"></a>82.4.3. Example of Usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup set add Q1Data TEAM3,TEAM_4</code></pre> @@ -18340,7 +18469,7 @@ or remote cluster, backup strategy. This information can help you in case of fai </div> </div> <div class="sect1"> -<h2 id="br.administration"><a class="anchor" href="#br.administration"></a>81. Administration of Backup Images</h2> +<h2 id="br.administration"><a class="anchor" href="#br.administration"></a>83. Administration of Backup Images</h2> <div class="sectionbody"> <div class="paragraph"> <p>The <code>hbase backup</code> command has several subcommands that help with administering backup images as they accumulate. Most production @@ -18353,7 +18482,7 @@ You can also delete backup images.</p> the HBase superuser.</p> </div> <div class="sect2"> -<h3 id="br.managing.backup.progress"><a class="anchor" href="#br.managing.backup.progress"></a>81.1. Managing Backup Progress</h3> +<h3 id="br.managing.backup.progress"><a class="anchor" href="#br.managing.backup.progress"></a>83.1. Managing Backup Progress</h3> <div class="paragraph"> <p>You can monitor a running backup in another terminal session by running the <em>hbase backup progress</em> command and specifying the backup ID as an argument.</p> </div> @@ -18366,7 +18495,7 @@ the HBase superuser.</p> </div> </div> <div class="sect3"> -<h4 id="br.progress.positional.cli.arguments"><a class="anchor" href="#br.progress.positional.cli.arguments"></a>81.1.1. Positional Command-Line Arguments</h4> +<h4 id="br.progress.positional.cli.arguments"><a class="anchor" href="#br.progress.positional.cli.arguments"></a>83.1.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_id</em></dt> @@ -18377,13 +18506,13 @@ the HBase superuser.</p> </div> </div> <div class="sect3"> -<h4 id="br.progress.named.cli.arguments"><a class="anchor" href="#br.progress.named.cli.arguments"></a>81.1.2. Named Command-Line Arguments</h4> +<h4 id="br.progress.named.cli.arguments"><a class="anchor" href="#br.progress.named.cli.arguments"></a>83.1.2. Named Command-Line Arguments</h4> <div class="paragraph"> <p>None.</p> </div> </div> <div class="sect3"> -<h4 id="br.progress.example"><a class="anchor" href="#br.progress.example"></a>81.1.3. Example usage</h4> +<h4 id="br.progress.example"><a class="anchor" href="#br.progress.example"></a>83.1.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java">hbase backup progress backupId_1467823988425</code></pre> @@ -18392,7 +18521,7 @@ the HBase superuser.</p> </div> </div> <div class="sect2"> -<h3 id="br.managing.backup.history"><a class="anchor" href="#br.managing.backup.history"></a>81.2. Managing Backup History</h3> +<h3 id="br.managing.backup.history"><a class="anchor" href="#br.managing.backup.history"></a>83.2. Managing Backup History</h3> <div class="paragraph"> <p>This command displays a log of backup sessions. The information for each session includes backup ID, type (full or incremental), the tables in the backup, status, and start and end time. Specify the number of backup sessions to display with the optional -n argument.</p> @@ -18403,7 +18532,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.history.positional.cli.arguments"><a class="anchor" href="#br.history.positional.cli.arguments"></a>81.2.1. Positional Command-Line Arguments</h4> +<h4 id="br.history.positional.cli.arguments"><a class="anchor" href="#br.history.positional.cli.arguments"></a>83.2.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_id</em></dt> @@ -18414,7 +18543,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.history.named.cli.arguments"><a class="anchor" href="#br.history.named.cli.arguments"></a>81.2.2. Named Command-Line Arguments</h4> +<h4 id="br.history.named.cli.arguments"><a class="anchor" href="#br.history.named.cli.arguments"></a>83.2.2. Named Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>-n <num_records></em></dt> @@ -18437,7 +18566,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.history.backup.example"><a class="anchor" href="#br.history.backup.example"></a>81.2.3. Example usage</h4> +<h4 id="br.history.backup.example"><a class="anchor" href="#br.history.backup.example"></a>83.2.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup history @@ -18448,7 +18577,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect2"> -<h3 id="br.describe.backup"><a class="anchor" href="#br.describe.backup"></a>81.3. Describing a Backup Image</h3> +<h3 id="br.describe.backup"><a class="anchor" href="#br.describe.backup"></a>83.3. Describing a Backup Image</h3> <div class="paragraph"> <p>This command can be used to obtain information about a specific backup image.</p> </div> @@ -18458,7 +18587,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.describe.backup.positional.cli.arguments"><a class="anchor" href="#br.describe.backup.positional.cli.arguments"></a>81.3.1. Positional Command-Line Arguments</h4> +<h4 id="br.describe.backup.positional.cli.arguments"><a class="anchor" href="#br.describe.backup.positional.cli.arguments"></a>83.3.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_id</em></dt> @@ -18469,13 +18598,13 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.describe.backup.named.cli.arguments"><a class="anchor" href="#br.describe.backup.named.cli.arguments"></a>81.3.2. Named Command-Line Arguments</h4> +<h4 id="br.describe.backup.named.cli.arguments"><a class="anchor" href="#br.describe.backup.named.cli.arguments"></a>83.3.2. Named Command-Line Arguments</h4> <div class="paragraph"> <p>None.</p> </div> </div> <div class="sect3"> -<h4 id="br.describe.backup.example"><a class="anchor" href="#br.describe.backup.example"></a>81.3.3. Example usage</h4> +<h4 id="br.describe.backup.example"><a class="anchor" href="#br.describe.backup.example"></a>83.3.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup describe backupId_1467823988425</code></pre> @@ -18484,7 +18613,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect2"> -<h3 id="br.delete.backup"><a class="anchor" href="#br.delete.backup"></a>81.4. Deleting a Backup Image</h3> +<h3 id="br.delete.backup"><a class="anchor" href="#br.delete.backup"></a>83.4. Deleting a Backup Image</h3> <div class="paragraph"> <p>This command can be used to delete a backup image which is no longer needed.</p> </div> @@ -18494,7 +18623,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.delete.backup.positional.cli.arguments"><a class="anchor" href="#br.delete.backup.positional.cli.arguments"></a>81.4.1. Positional Command-Line Arguments</h4> +<h4 id="br.delete.backup.positional.cli.arguments"><a class="anchor" href="#br.delete.backup.positional.cli.arguments"></a>83.4.1. Positional Command-Line Arguments</h4> <div class="dlist"> <dl> <dt class="hdlist1"><em>backup_id</em></dt> @@ -18505,13 +18634,13 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect3"> -<h4 id="br.delete.backup.named.cli.arguments"><a class="anchor" href="#br.delete.backup.named.cli.arguments"></a>81.4.2. Named Command-Line Arguments</h4> +<h4 id="br.delete.backup.named.cli.arguments"><a class="anchor" href="#br.delete.backup.named.cli.arguments"></a>83.4.2. Named Command-Line Arguments</h4> <div class="paragraph"> <p>None.</p> </div> </div> <div class="sect3"> -<h4 id="br.delete.backup.example"><a class="anchor" href="#br.delete.backup.example"></a>81.4.3. Example usage</h4> +<h4 id="br.delete.backup.example"><a class="anchor" href="#br.delete.backup.example"></a>83.4.3. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup delete backupId_1467823988425</code></pre> @@ -18520,7 +18649,7 @@ in the backup, status, and start and end time. Specify the number of backup sess </div> </div> <div class="sect2"> -<h3 id="br.repair.backup"><a class="anchor" href="#br.repair.backup"></a>81.5. Backup Repair Command</h3> +<h3 id="br.repair.backup"><a class="anchor" href="#br.repair.backup"></a>83.5. Backup Repair Command</h3> <div class="paragraph"> <p>This command attempts to correct any inconsistencies in persisted backup metadata which exists as the result of software errors or unhandled failure scenarios. While the backup implementation tries @@ -18533,19 +18662,19 @@ automatically recover on its own.</p> </div> </div> <div class="sect3"> -<h4 id="br.repair.backup.positional.cli.arguments"><a class="anchor" href="#br.repair.backup.positional.cli.arguments"></a>81.5.1. Positional Command-Line Arguments</h4> +<h4 id="br.repair.backup.positional.cli.arguments"><a class="anchor" href="#br.repair.backup.positional.cli.arguments"></a>83.5.1. Positional Command-Line Arguments</h4> <div class="paragraph"> <p>None.</p> </div> </div> </div> <div class="sect2"> -<h3 id="br.repair.backup.named.cli.arguments"><a class="anchor" href="#br.repair.backup.named.cli.arguments"></a>81.6. Named Command-Line Arguments</h3> +<h3 id="br.repair.backup.named.cli.arguments"><a class="anchor" href="#br.repair.backup.named.cli.arguments"></a>83.6. Named Command-Line Arguments</h3> <div class="paragraph"> <p>None.</p> </div> <div class="sect3"> -<h4 id="br.repair.backup.example"><a class="anchor" href="#br.repair.backup.example"></a>81.6.1. Example usage</h4> +<h4 id="br.repair.backup.example"><a class="anchor" href="#br.repair.backup.example"></a>83.6.1. Example usage</h4> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> hbase backup repair</code></pre> @@ -18556,13 +18685,13 @@ automatically recover on its own.</p> </div> </div> <div class="sect1"> -<h2 id="br.backup.configuration"><a class="anchor" href="#br.backup.configuration"></a>82. Configuration keys</h2> +<h2 id="br.backup.configuration"><a class="anchor" href="#br.backup.configuration"></a>84. Configuration keys</h2> <div class="sectionbody"> <div class="paragraph"> <p>The backup and restore feature includes both required and optional configuration keys.</p> </div> <div class="sect2"> -<h3 id="_required_properties"><a class="anchor" href="#_required_properties"></a>82.1. Required properties</h3> +<h3 id="_required_properties"><a class="anchor" href="#_required_properties"></a>84.1. Required properties</h3> <div class="paragraph"> <p><em>hbase.backup.enable</em>: Controls whether or not the feature is enabled (Default: <code>false</code>). Set this value to <code>true</code>.</p> </div> @@ -18588,7 +18717,7 @@ to <code>org.apache.hadoop.hbase.backup.BackupHFileCleaner</code> or append it t </div> </div> <div class="sect2"> -<h3 id="_optional_properties"><a class="anchor" href="#_optional_properties"></a>82.2. Optional properties</h3> +<h3 id="_optional_properties"><a class="anchor" href="#_optional_properties"></a>84.2. Optional properties</h3> <div class="paragraph"> <p><em>hbase.backup.system.ttl</em>: The time-to-live in seconds of data in the <code>hbase:backup</code> tables (default: forever). This property is only relevant prior to the creation of the <code>hbase:backup</code> table. Use the <code>alter</code> command in the HBase shell to modify the TTL @@ -18609,10 +18738,10 @@ in the Master’s procedure framework (default: 30000).</p> </div> </div> <div class="sect1"> -<h2 id="br.best.practices"><a class="anchor" href="#br.best.practices"></a>83. Best Practices</h2> +<h2 id="br.best.practices"><a class="anchor" href="#br.best.practices"></a>85. Best Practices</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_formulate_a_restore_strategy_and_test_it"><a class="anchor" href="#_formulate_a_restore_strategy_and_test_it"></a>83.1. Formulate a restore strategy and test it.</h3> +<h3 id="_formulate_a_restore_strategy_and_test_it"><a class="anchor" href="#_formulate_a_restore_strategy_and_test_it"></a>85.1. Formulate a restore strategy and test it.</h3> <div class="paragraph"> <p>Before you rely on a backup and restore strategy for your production environment, identify how backups must be performed, and more importantly, how restores must be performed. Test the plan to ensure that it is workable. @@ -18628,7 +18757,7 @@ at the whole primary site (fire, earthquake, etc.), the remote backup site can b </div> </div> <div class="sect2"> -<h3 id="_secure_a_full_backup_image_first"><a class="anchor" href="#_secure_a_full_backup_image_first"></a>83.2. Secure a full backup image first.</h3> +<h3 id="_secure_a_full_backup_image_first"><a class="anchor" href="#_secure_a_full_backup_image_first"></a>85.2. Secure a full backup image first.</h3> <div class="paragraph"> <p>As a baseline, you must complete a full backup of HBase data at least once before you can rely on incremental backups. The full backup should be stored outside of the source cluster. To ensure complete dataset recovery, you must run the restore utility @@ -18637,7 +18766,7 @@ is applied on top of the full backup during the restore operation to return you </div> </div> <div class="sect2"> -<h3 id="_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"><a class="anchor" href="#_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"></a>83.3. Define and use backup sets for groups of tables that are logical subsets of the entire dataset.</h3> +<h3 id="_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"><a class="anchor" href="#_define_and_use_backup_sets_for_groups_of_tables_that_are_logical_subsets_of_the_entire_dataset"></a>85.3. Define and use backup sets for groups of tables that are logical subsets of the entire dataset.</h3> <div class="paragraph"> <p>You can group tables into an object called a backup set. A backup set can save time when you have a particular group of tables that you expect to repeatedly back up or restore.</p> @@ -18649,7 +18778,7 @@ to the command execution instead of entering all the table names individually.</ </div> </div> <div class="sect2"> -<h3 id="_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"><a class="anchor" href="#_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"></a>83.4. Document the backup and restore strategy, and ideally log information about each backup.</h3> +<h3 id="_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"><a class="anchor" href="#_document_the_backup_and_restore_strategy_and_ideally_log_information_about_each_backup"></a>85.4. Document the backup and restore strategy, and ideally log information about each backup.</h3> <div class="paragraph"> <p>Document the whole process so that the knowledge base can transfer to new administrators after employee turnover. As an extra safety precaution, also log the calendar date, time, and other relevant details about the data of each backup. This metadata @@ -18661,7 +18790,7 @@ accessed by an administrator remotely from the production cluster.</p> </div> </div> <div class="sect1"> -<h2 id="br.s3.backup.scenario"><a class="anchor" href="#br.s3.backup.scenario"></a>84. Scenario: Safeguarding Application Datasets on Amazon S3</h2> +<h2 id="br.s3.backup.scenario"><a class="anchor" href="#br.s3.backup.scenario"></a>86. Scenario: Safeguarding Application Datasets on Amazon S3</h2> <div class="sectionbody"> <div class="paragraph"> <p>This scenario describes how a hypothetical retail business uses backups to safeguard application data and then restore the dataset @@ -18777,7 +18906,7 @@ existing data in the destination. In this case, the admin decides to overwrite t </div> </div> <div class="sect1"> -<h2 id="br.data.security"><a class="anchor" href="#br.data.security"></a>85. Security of Backup Data</h2> +<h2 id="br.data.security"><a class="anchor" href="#br.data.security"></a>87. Security of Backup Data</h2> <div class="sectionbody"> <div class="paragraph"> <p>With this feature which makes copying data to remote locations, it’s important to take a moment to clearly state the procedural @@ -18795,7 +18924,7 @@ providing a comparable level of security. This is a manual step which users <str </div> </div> <div class="sect1"> -<h2 id="br.technical.details"><a class="anchor" href="#br.technical.details"></a>86. Technical Details of Incremental Backup and Restore</h2> +<h2 id="br.technical.details"><a class="anchor" href="#br.technical.details"></a>88. Technical Details of Incremental Backup and Restore</h2> <div class="sectionbody"> <div class="paragraph"> <p>HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore @@ -18816,7 +18945,7 @@ Bulk Load utility automatically imports as restored data in the table.</p> </div> </div> <div class="sect1"> -<h2 id="br.filesystem.growth.warning"><a class="anchor" href="#br.filesystem.growth.warning"></a>87. A Warning on File System Growth</h2> +<h2 id="br.filesystem.growth.warning"><a class="anchor" href="#br.filesystem.growth.warning"></a>89. A Warning on File System Growth</h2> <div class="sectionbody"> <div class="paragraph"> <p>As a reminder, incremental backups are implemented via retaining the write-ahead logs which HBase primarily uses for data durability. @@ -18837,7 +18966,7 @@ in the HBase shell. Modifying the configuration property <code>hbase.backup.syst </div> </div> <div class="sect1"> -<h2 id="br.backup.capacity.planning"><a class="anchor" href="#br.backup.capacity.planning"></a>88. Capacity Planning</h2> +<h2 id="br.backup.capacity.planning"><a class="anchor" href="#br.backup.capacity.planning"></a>90. Capacity Planning</h2> <div class="sectionbody"> <div class="paragraph"> <p>When designing a distributed system deployment, it is critical that some basic mathmatical rigor is executed to ensure sufficient computational @@ -18846,7 +18975,7 @@ bottleneck when estimating the performance of some implementation of backup and data can be read/written.</p> </div> <div class="sect2"> -<h3 id="_full_backups"><a class="anchor" href="#_full_backups"></a>88.1. Full Backups</h3> +<h3 id="_full_backups"><a class="anchor" href="#_full_backups"></a>90.1. Full Backups</h3> <div class="paragraph"> <p>To estimate the duration of a full backup, we have to understand the general actions which are invoked:</p> </div> @@ -18886,7 +19015,7 @@ data at the full 80MB/s and <code>-w</code> is used to limit the job from spawni </div> </div> <div class="sect2"> -<h3 id="_incremental_backup"><a class="anchor" href="#_incremental_backup"></a>88.2. Incremental Backup</h3> +<h3 id="_incremental_backup"><a class="anchor" href="#_incremental_backup"></a>90.2. Incremental Backup</h3> <div class="paragraph"> <p>Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost.</p> </div> @@ -18913,7 +19042,7 @@ DistCp MapReduce job would likely dominate the actual time taken to copy the dat </div> </div> <div class="sect1"> -<h2 id="br.limitations"><a class="anchor" href="#br.limitations"></a>89. Limitations of the Backup and Restore Utility</h2> +<h2 id="br.limitations"><a class="anchor" href="#br.limitations"></a>91. Limitations of the Backup and Restore Utility</h2> <div class="sectionbody"> <div class="paragraph"> <p><strong>Serial backup operations</strong></p> @@ -18991,7 +19120,7 @@ See <a href="#external_apis">Apache HBase External APIs</a> for more information </div> </div> <div class="sect1"> -<h2 id="_examples"><a class="anchor" href="#_examples"></a>90. Examples</h2> +<h2 id="_examples"><a class="anchor" href="#_examples"></a>92. Examples</h2> <div class="sectionbody"> <div class="exampleblock"> <div class="title">Example 40. Create, modify and delete a Table Using Java</div> @@ -19102,7 +19231,7 @@ through custom protocols. For information on using the native HBase APIs, refer </div> </div> <div class="sect1"> -<h2 id="_rest"><a class="anchor" href="#_rest"></a>91. REST</h2> +<h2 id="_rest"><a class="anchor" href="#_rest"></a>93. REST</h2> <div class="sectionbody"> <div class="paragraph"> <p>Representational State Transfer (REST) was introduced in 2000 in the doctoral @@ -19118,7 +19247,7 @@ There is also a nice series of blogs on by Jesse Anderson.</p> </div> <div class="sect2"> -<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>91.1. Starting and Stopping the REST Server</h3> +<h3 id="_starting_and_stopping_the_rest_server"><a class="anchor" href="#_starting_and_stopping_the_rest_server"></a>93.1. Starting and Stopping the REST Server</h3> <div class="paragraph"> <p>The included REST server can run as a daemon which starts an embedded Jetty servlet container and deploys the servlet into it. Use one of the following commands @@ -19145,7 +19274,7 @@ following command if you were running it in the background.</p> </div> </div> <div class="sect2"> -<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>91.2. Configuring the REST Server and Client</h3> +<h3 id="_configuring_the_rest_server_and_client"><a class="anchor" href="#_configuring_the_rest_server_and_client"></a>93.2. Configuring the REST Server and Client</h3> <div class="paragraph"> <p>For information about configuring the REST server and client for SSL, as well as <code>doAs</code> impersonation for the REST server, see <a href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on Behalf of the Client</a> and other portions @@ -19153,7 +19282,7 @@ of the <a href="#security">Securing Apache HBase</a> chapter.</p> </div> </div> <div class="sect2"> -<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>91.3. Using REST Endpoints</h3> +<h3 id="_using_rest_endpoints"><a class="anchor" href="#_using_rest_endpoints"></a>93.3. Using REST Endpoints</h3> <div class="paragraph"> <p>The following examples use the placeholder server http://example.com:8000, and the following commands can all be run using <code>curl</code> or <code>wget</code> commands. You can request @@ -19520,7 +19649,7 @@ curl -vi -X PUT \ </table> </div> <div class="sect2"> -<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>91.4. REST XML Schema</h3> +<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>93.4. REST XML Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="xml"><span class="tag"><schema</span> <span class="attribute-name">xmlns</span>=<span class="string"><span class="delimiter">"</span><span class="content">http://www.w3.org/2001/XMLSchema</span><span class="delimiter">"</span></span> <span class="attribute-name">xmlns:tns</span>=<span class="string"><span class="delimiter">"</span><span class="content">RESTSchema</span><span class="delimiter">"</span></span><span class="tag">></span> @@ -19678,7 +19807,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect2"> -<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>91.5. REST Protobufs Schema</h3> +<h3 id="protobufs_schema"><a class="anchor" href="#protobufs_schema"></a>93.5. REST Protobufs Schema</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="json"><span class="error">m</span><span class="error">e</span><span class="error">s</span><span class="error">s</span><span class="error">a</span><span class="error">g</span><span class="error">e</span> <span class="error">V</span><span class="error">e</span><span class="error">r</span><span class="error">s</span><span class="error">i</span><span class="error">o</span><span class="error">n</span> { @@ -19786,7 +19915,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>92. Thrift</h2> +<h2 id="_thrift"><a class="anchor" href="#_thrift"></a>94. Thrift</h2> <div class="sectionbody"> <div class="paragraph"> <p>Documentation about Thrift has moved to <a href="#thrift">Thrift API and Filter Language</a>.</p> @@ -19794,7 +19923,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="c"><a class="anchor" href="#c"></a>93. C/C++ Apache HBase Client</h2> +<h2 id="c"><a class="anchor" href="#c"></a>95. C/C++ Apache HBase Client</h2> <div class="sectionbody"> <div class="paragraph"> <p>FB’s Chip Turner wrote a pure C/C++ client. @@ -19806,7 +19935,7 @@ curl -vi -X PUT \ </div> </div> <div class="sect1"> -<h2 id="jdo"><a class="anchor" href="#jdo"></a>94. Using Java Data Objects (JDO) with HBase</h2> +<h2 id="jdo"><a class="anchor" href="#jdo"></a>96. Using Java Data Objects (JDO) with HBase</h2> <div class="sectionbody"> <div class="paragraph"> <p><a href="https://db.apache.org/jdo/">Java Data Objects (JDO)</a> is a standard way to @@ -19963,10 +20092,10 @@ a row, get a column value, perform a query, and do some additional HBase operati </div> </div> <div class="sect1"> -<h2 id="scala"><a class="anchor" href="#scala"></a>95. Scala</h2> +<h2 id="scala"><a class="anchor" href="#scala"></a>97. Scala</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>95.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath"><a class="anchor" href="#_setting_the_classpath"></a>97.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Scala with HBase, your CLASSPATH must include HBase’s classpath as well as the Scala JARs required by your code. First, use the following command on a server @@ -19991,7 +20120,7 @@ your project.</p> </div> </div> <div class="sect2"> -<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>95.2. Scala SBT File</h3> +<h3 id="_scala_sbt_file"><a class="anchor" href="#_scala_sbt_file"></a>97.2. Scala SBT File</h3> <div class="paragraph"> <p>Your <code>build.sbt</code> file needs the following <code>resolvers</code> and <code>libraryDependencies</code> to work with HBase.</p> @@ -20010,7 +20139,7 @@ libraryDependencies ++= Seq( </div> </div> <div class="sect2"> -<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>95.3. Example Scala Code</h3> +<h3 id="_example_scala_code"><a class="anchor" href="#_example_scala_code"></a>97.3. Example Scala Code</h3> <div class="paragraph"> <p>This example lists HBase tables, creates a new table, and adds a row to it.</p> </div> @@ -20048,10 +20177,10 @@ println(Bytes.toString(value))</code></pre> </div> </div> <div class="sect1"> -<h2 id="jython"><a class="anchor" href="#jython"></a>96. Jython</h2> +<h2 id="jython"><a class="anchor" href="#jython"></a>98. Jython</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>96.1. Setting the Classpath</h3> +<h3 id="_setting_the_classpath_2"><a class="anchor" href="#_setting_the_classpath_2"></a>98.1. Setting the Classpath</h3> <div class="paragraph"> <p>To use Jython with HBase, your CLASSPATH must include HBase’s classpath as well as the Jython JARs required by your code. First, use the following command on a server @@ -20080,7 +20209,7 @@ $ bin/hbase org.python.util.jython</p> </div> </div> <div class="sect2"> -<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>96.2. Jython Code Examples</h3> +<h3 id="_jython_code_examples"><a class="anchor" href="#_jython_code_examples"></a>98.2. Jython Code Examples</h3> <div class="exampleblock"> <div class="title">Example 42. Table Creation, Population, Get, and Delete with Jython</div> <div class="content"> @@ -20189,7 +20318,7 @@ The Thrift API relies on client and server processes.</p> </div> </div> <div class="sect1"> -<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>97. Filter Language</h2> +<h2 id="thrift.filter_language"><a class="anchor" href="#thrift.filter_language"></a>99. Filter Language</h2> <div class="sectionbody"> <div class="paragraph"> <p>Thrift Filter Language was introduced in HBase 0.92. @@ -20200,7 +20329,7 @@ You can find out more about shell integration by using the <code>scan help</code <p>You specify a filter as a string, which is parsed on the server to construct the filter.</p> </div> <div class="sect2"> -<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>97.1. General Filter String Syntax</h3> +<h3 id="general_syntax"><a class="anchor" href="#general_syntax"></a>99.1. General Filter String Syntax</h3> <div class="paragraph"> <p>A simple filter expression is expressed as a string:</p> </div> @@ -20235,7 +20364,7 @@ If single quotes are present in the argument, they must be escaped by an additio </div> </div> <div class="sect2"> -<h3 id="_compound_filters_and_operators"><a class="anchor" href="#_compound_filters_and_operators"></a>97.2. Compound Filters and Operators</h3> +<h3 id="_compound_filters_and_operators"><a class="anchor" href="#_compound_filters_and_operators"></a>99.2. Compound Filters and Operators</h3> <div class="dlist"> <div class="title">Binary Operators</div> <dl> @@ -20277,7 +20406,7 @@ If single quotes are present in the argument, they must be escaped by an additio </div> </div> <div class="sect2"> -<h3 id="_order_of_evaluation"><a class="anchor" href="#_order_of_evaluation"></a>97.3. Order of Evaluation</h3> +<h3 id="_order_of_evaluation"><a class="anchor" href="#_order_of_evaluation"></a>99.3. Order of Evaluation</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -20315,7 +20444,7 @@ is evaluated as </div> </div> <div class="sect2"> -<h3 id="_compare_operator"><a class="anchor" href="#_compare_operator"></a>97.4. Compare Operator</h3> +<h3 id="_compare_operator"><a class="anchor" href="#_compare_operator"></a>99.4. Compare Operator</h3> <div class="paragraph"> <p>The following compare operators are provided:</p> </div> @@ -20349,7 +20478,7 @@ is evaluated as </div> </div> <div class="sect2"> -<h3 id="_comparator"><a class="anchor" href="#_comparator"></a>97.5. Comparator</h3> +<h3 id="_comparator"><a class="anchor" href="#_comparator"></a>99.5. Comparator</h3> <div class="paragraph"> <p>A comparator can be any of the following:</p> </div> @@ -20417,7 +20546,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="examplephpclientprogram"><a class="anchor" href="#examplephpclientprogram"></a>97.6. Example PHP Client Program that uses the Filter Language</h3> +<h3 id="examplephpclientprogram"><a class="anchor" href="#examplephpclientprogram"></a>99.6. Example PHP Client Program that uses the Filter Language</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="php"><span class="inline-delimiter"><?</span> @@ -20440,7 +20569,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="_example_filter_strings"><a class="anchor" href="#_example_filter_strings"></a>97.7. Example Filter Strings</h3> +<h3 id="_example_filter_strings"><a class="anchor" href="#_example_filter_strings"></a>99.7. Example Filter Strings</h3> <div class="ulist"> <ul> <li> @@ -20489,7 +20618,7 @@ Only EQUAL and NOT_EQUAL comparisons are valid with this comparator</p> </div> </div> <div class="sect2"> -<h3 id="individualfiltersyntax"><a class="anchor" href="#individualfiltersyntax"></a>97.8. Individual Filter Syntax</h3> +<h3 id="individualfiltersyntax"><a class="anchor" href="#individualfiltersyntax"></a>99.8. Individual Filter Syntax</h3> <div class="dlist"> <dl> <dt class="hdlist1">KeyOnlyFilter</dt> @@ -20632,7 +20761,7 @@ application.</p> </div> </div> <div class="sect1"> -<h2 id="_basic_spark"><a class="anchor" href="#_basic_spark"></a>98. Basic Spark</h2> +<h2 id="_basic_spark"><a class="anchor" href="#_basic_spark"></a>100. Basic Spark</h2> <div class="sectionbody"> <div class="paragraph"> <p>This section discusses Spark HBase integration at the lowest and simplest levels. @@ -20763,7 +20892,7 @@ access to HBase</p> </div> </div> <div class="sect1"> -<h2 id="_spark_streaming"><a class="anchor" href="#_spark_streaming"></a>99. Spark Streaming</h2> +<h2 id="_spark_streaming"><a class="anchor" href="#_spark_streaming"></a>101. Spark Streaming</h2> <div class="sectionbody"> <div class="paragraph"> <p><a href="https://spark.apache.org/streaming/">Spark Streaming</a> is a micro batching stream @@ -20860,7 +20989,7 @@ to the HBase Connections in the executors </div> </div> <div class="sect1"> -<h2 id="_bulk_load"><a class="anchor" href="#_bulk_load"></a>100. Bulk Load</h2> +<h2 id="_bulk_load"><a class="anchor" href="#_bulk_load"></a>102. Bulk Load</h2> <div class="sectionbody"> <div class="paragraph"> <p>There are two options for bulk loading data into HBase with Spark. There is the @@ -21068,7 +21197,7 @@ values for this row for all column families.</p> </div> </div> <div class="sect1"> -<h2 id="_sparksql_dataframes"><a class="anchor" href="#_sparksql_dataframes"></a>101. SparkSQL/DataFrames</h2> +<h2 id="_sparksql_dataframes"><a class="anchor" href="#_sparksql_dataframes"></a>103. SparkSQL/DataFrames</h2> <div class="sectionbody"> <div class="paragraph"> <p>HBase-Spark Connector (in HBase-Spark Module) leverages @@ -21088,7 +21217,7 @@ then load HBase DataFrame. After that, users can do integrated query and access in HBase table with SQL query. Following illustrates the basic procedure.</p> </div> <div class="sect2"> -<h3 id="_define_catalog"><a class="anchor" href="#_define_catalog"></a>101.1. Define catalog</h3> +<h3 id="_define_catalog"><a class="anchor" href="#_define_catalog"></a>103.1. Define catalog</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">def catalog = s"""{ @@ -21117,7 +21246,7 @@ also has to be defined in details as a column (col0), which has a specific cf (r </div> </div> <div class="sect2"> -<h3 id="_save_the_dataframe"><a class="anchor" href="#_save_the_dataframe"></a>101.2. Save the DataFrame</h3> +<h3 id="_save_the_dataframe"><a class="anchor" href="#_save_the_dataframe"></a>103.2. Save the DataFrame</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">case class HBaseRecord( @@ -21164,7 +21293,7 @@ will create an HBase table with 5 regions and save the DataFrame inside.</p> </div> </div> <div class="sect2"> -<h3 id="_load_the_dataframe"><a class="anchor" href="#_load_the_dataframe"></a>101.3. Load the DataFrame</h3> +<h3 id="_load_the_dataframe"><a class="anchor" href="#_load_the_dataframe"></a>103.3. Load the DataFrame</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">def withCatalog(cat: String): DataFrame = { @@ -21188,7 +21317,7 @@ by <code>withCatalog</code> function could be used to access HBase table, such a </div> </div> <div class="sect2"> -<h3 id="_language_integrated_query"><a class="anchor" href="#_language_integrated_query"></a>101.4. Language Integrated Query</h3> +<h3 id="_language_integrated_query"><a class="anchor" href="#_language_integrated_query"></a>103.4. Language Integrated Query</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") || @@ -21205,7 +21334,7 @@ s.show</code></pre> </div> </div> <div class="sect2"> -<h3 id="_sql_query"><a class="anchor" href="#_sql_query"></a>101.5. SQL Query</h3> +<h3 id="_sql_query"><a class="anchor" href="#_sql_query"></a>103.5. SQL Query</h3> <div class="listingblock"> <div class="content"> <pre class="CodeRay highlight"><code data-lang="scala">df.registerTempTable("table1") @@ -21219,7 +21348,7 @@ The lifetime of this temporary table is tied to the SQLContext that was used to </div> </div> <div class="sect2"> -<h3 id="_others"><a class="anchor" href="#_others"></a>101.6. Others</h3> +<h3 id="_others"><a class="anchor" href="#_others"></a>103.6. Others</h3> <div class="exampleblock"> <div class="title">Example 51. Query with different timestamps</div> <div class="content"> @@ -21467,7 +21596,7 @@ coprocessor can severely degrade cluster performance and stability.</p> </div> </div> <div class="sect1"> -<h2 id="_coprocessor_overview"><a class="anchor" href="#_coprocessor_overview"></a>102. Coprocessor Overview</h2> +<h2 id="_coprocessor_overview"><a class="anchor" href="#_coprocessor_overview"></a>104. Coprocessor Overview</h2> <div class="sectionbody"> <div class="paragraph"> <p>In HBase, you fetch data using a <code>Get</code> or <code>Scan</code>, whereas in an RDBMS you use a SQL @@ -21493,7 +21622,7 @@ data, and returns the result to the client.</p> are some analogies which may help to explain some of the benefits of coprocessors.</p> </div> <div class="sect2"> -<h3 id="cp_analogies"><a class="anchor" href="#cp_analogies"></a>102.1. Coprocessor Analogies</h3> +<h3 id="cp_analogies"><a class="anchor" href="#cp_analogies"></a>104.1. Coprocessor Analogies</h3> <div class="dlist"> <dl> <dt class="hdlist1">Triggers and Stored Procedure</dt> @@ -21519,7 +21648,7 @@ before passing the request on to its final destination (or even changing the des </div> </div> <div class="sect2"> -<h3 id="_coprocessor_implementation_overview"><a class="anchor" href="#_coprocessor_implementation_overview"></a>102.2. Coprocessor Implementation Overview</h3> +<h3 id="_coprocessor_implementation_overview"><a class="anchor" href="#_coprocessor_implementation_overview"></a>104.2. Coprocessor Implementation Overview</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -21547,10 +21676,10 @@ package.</p> </div> </div> <div class="sect1"> -<h2 id="_types_of_coprocessors"><a class="anchor" href="#_types_of_coprocessors"></a>103. Types of Coprocessors</h2> +<h2 id="_types_of_coprocessors"><a class="anchor" href="#_types_of_coprocessors"></a>105. Types of Coprocessors</h2> <div class="sectionbody"> <div class="sect2"> -<h3 id="_observer_coprocessors"><a class="anchor" href="#_observer_coprocessors"></a>103.1. Observer Coprocessors</h3> +<h3 id="_observer_coprocessors"><a class="anchor" href="#_observer_coprocessors"></a>105.1. Observer Coprocessors</h3> <div class="paragraph"> <p>Observer coprocessors are triggered either before or after a specific event occurs. Observers that happen before an event use methods that start with a <code>pre</code> prefix, @@ -21558,7 +21687,7 @@ such as <a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/cop with a <code>post</code> prefix, such as <a href="https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-"><code>postPut</code></a>.</p> </div> <div class="sect3"> -<h4 id="_use_cases_for_observer_coprocessors"><a class="anchor" href="#_use_cases_for_observer_coprocessors"></a>103.1.1. Use Cases for Observer Coprocessors</h4> +<h4 id="_use_cases_for_observer_coprocessors"><a class="anchor" href="#_use_cases_for_observer_coprocessors"></a>105.1.1. Use Cases for Observer Coprocessors</h4> <div class="dlist"> <dl> <dt class="hdlist1">Security</dt> @@ -21583,7 +21712,7 @@ a coprocessor to use the <code>prePut</code> method on <code>user</code> to inse </div> </div> <div class="sect3"> -<h4 id="_types_of_observer_coprocessor"><a class="anchor" href="#_types_of_observer_coprocessor"></a>103.1.2. Types of Observer Coprocessor</h4> +<h4 id="_types_of_observer_coprocessor"><a class="anchor" href="#_types_of_observer_coprocessor"></a>105.1.2. Types of Observer Coprocessor</h4> <div class="dlist"> <dl> <dt class="hdlist1">RegionObserver</dt> @@ -21619,7 +21748,7 @@ Log (WAL). See </div> </div> <div class="sect2"> -<h3 id="cpeps"><a class="anchor" href="#cpeps"></a>103.2. Endpoint Coprocessor</h3> +<h3 id="cpeps"><a class="anchor" href="#cpeps"></a>105.2. Endpoint Coprocessor</h3> <div class="paragraph"> <p>Endpoint processors allow you to perform computation at the location of the data. See <a href="#cp_analogies">Coprocessor Analogy</a>. An example is the need to calculate a running @@ -21663,14 +21792,14 @@ change.</p> </div> </div> <div class="sect1"> -<h2 id="cp_loading"><a class="anchor" href="#cp_loading"></a>104. Loading Coprocessors</h2> +<h2 id="cp_loading"><a class="anchor" href="#cp_loading"></a>106. Loading Coprocessors</h2> <div class="sectionbody"> <div class="paragraph"> <p>To make your coprocessor available to HBase, it must be <em>loaded</em>, either statically (through the HBase configuration) or dynamically (using HBase Shell or the Java API).</p> </div> <div class="sect2"> -<h3 id="_static_loading"><a class="anchor" href="#_static_loading"></a>104.1. Static Loading</h3> +<h3 id="_static_loading"><a class="anchor" href="#_static_loading"></a>106.1. Static Loading</h3> <div class="paragraph"> <p>Follow these steps to statically load your coprocessor. Keep in mind that you must restart HBase to unload a coprocessor that has been loaded statically.</p> @@ -21739,7 +21868,7 @@ HBase installation.</p> </div> </div> <div class="sect2"> -<h3 id="_static_unloading"><a class="anchor" href="#_static_unloading"></a>104.2. Static Unloading</h3> +<h3 id="_static_unloading"><a class="anchor" href="#_static_unloading"></a>106.2. Static Unloading</h3> <div class="olist arabic"> <ol class="arabic"> <li> @@ -21756,7 +21885,7 @@ directory.</p> </div> </div> <div class="sect2"> -<h3 id="_dynamic_loading"><a class="anchor" href="#_dynamic_loading"></a>104.3. Dynamic Loading</h3> +<h3 id="_dynamic_loading"><a class="anchor" href="#_dynamic_loading"></a>106.3. Dynamic Loading</h3> <div class="paragraph"> <p>You can also load a coprocessor dynamically, without restarting HBase. This may seem preferable to static loading, but dynamically loaded coprocessors are loaded on a @@ -21798,7 +21927,7 @@ dependencies.</p> </table> </div> <div class="sect3"> -<h4 id="load_coprocessor_in_shell"><a class="anchor" href="#load_coprocessor_in_shell"></a>104.3.1. Using HBase Shell</h4> +<h4 id="load_coprocessor_in_shell"><a class="anchor" href="#load_coprocessor_in_shell"></a>106.3.1. Using HBase Shell</h4> <div class="olist arabic"> <ol class="arabic"> <li> @@ -21874,7 +22003,7 @@ case the framework will assign a default priority value.</p> </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api_all_hbase_versions"><a class="anchor" href="#_using_the_java_api_all_hbase_versions"></a>104.3.2. Using the Java API (all HBase versions)</h4> +<h4 id="_using_the_java_api_all_hbase_versions"><a class="anchor" href="#_using_the_java_api_all_hbase_versions"></a>106.3.2. Using the Java API (all HBase versions)</h4> <div class="paragraph"> <p>The following Java code shows how to use the <code>setValue()</code> method of <code>HTableDescriptor</code> to load a coprocessor on the <code>users</code> table.</p> @@ -21903,7 +22032,7 @@ admin.enableTable(tableName);</code></pre> </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api_hbase_0_96_only"><a class="anchor" href="#_using_the_java_api_hbase_0_96_only"></a>104.3.3. Using the Java API (HBase 0.96+ only)</h4> +<h4 id="_using_the_java_api_hbase_0_96_only"><a class="anchor" href="#_using_the_java_api_hbase_0_96_only"></a>106.3.3. Using the Java API (HBase 0.96+ only)</h4> <div class="paragraph"> <p>In HBase 0.96 and newer, the <code>addCoprocessor()</code> method of <code>HTableDescriptor</code> provides an easier way to load a coprocessor dynamically.</p> @@ -21946,9 +22075,9 @@ verifies whether the given class is actually contained in the jar file. </div> </div> <div class="sect2"> -<h3 id="_dynamic_unloading"><a class="anchor" href="#_dynamic_unloading"></a>104.4. Dynamic Unloading</h3> +<h3 id="_dynamic_unloading"><a class="anchor" href="#_dynamic_unloading"></a>106.4. Dynamic Unloading</h3> <div class="sect3"> -<h4 id="_using_hbase_shell"><a class="anchor" href="#_using_hbase_shell"></a>104.4.1. Using HBase Shell</h4> +<h4 id="_using_hbase_shell"><a class="anchor" href="#_using_hbase_shell"></a>106.4.1. Using HBase Shell</h4> <div class="olist arabic"> <ol class="arabic"> <li> @@ -21979,7 +22108,7 @@ verifies whether the given class is actually contained in the jar file. </div> </div> <div class="sect3"> -<h4 id="_using_the_java_api"><a class="anchor" href="#_using_the_java_api"></a>104.4.2. Using the Java API</h4> +<h4 id="_using_the_java_api"><a class="anchor" href="#_using_the_java_api"></a>106.4.2. Using the Java API</h4> <div class="paragraph"> <p>Reload the table definition without setting the value of the coprocessor either by using <code>setValue()</code> or <code>addCoprocessor()</code> methods. This will remove any coprocessor @@ -22013,7 +22142,7 @@ admin.enableTable(tableName);</code></pre> </div> </div> <div class="sect1"> -<h2 id="cp_example"><a class="anchor" href="#cp_example"></a>105. Examples</h2> +<h2 id="cp_example"><a class="anchor" href="#cp_example"></a>107. Examples</h2> <div class="sectionbody"> <div class="paragraph"> <p>HBase ships examples for Observer Coprocessor.</p> @@ -22084,7 +22213,7 @@ of the <code>use
<TRUNCATED>
