http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/performance.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml deleted file mode 100644 index 50de8aa..0000000 --- a/src/main/docbkx/performance.xml +++ /dev/null @@ -1,775 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<chapter version="5.0" xml:id="performance" - xmlns="http://docbook.org/ns/docbook" - xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:xi="http://www.w3.org/2001/XInclude" - xmlns:svg="http://www.w3.org/2000/svg" - xmlns:m="http://www.w3.org/1998/Math/MathML" - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:db="http://docbook.org/ns/docbook"> -<!-- -/** - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ ---> - <title>Apache HBase Performance Tuning</title> - - <section xml:id="perf.os"> - <title>Operating System</title> - <section xml:id="perf.os.ram"> - <title>Memory</title> - <para>RAM, RAM, RAM. Don't starve HBase.</para> - </section> - <section xml:id="perf.os.64"> - <title>64-bit</title> - <para>Use a 64-bit platform (and 64-bit JVM).</para> - </section> - <section xml:id="perf.os.swap"> - <title>Swapping</title> - <para>Watch out for swapping. Set swappiness to 0.</para> - </section> - </section> - <section xml:id="perf.network"> - <title>Network</title> - <para> - Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware - that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). - </para> - <para> - Important items to consider: - <itemizedlist> - <listitem>Switching capacity of the device</listitem> - <listitem>Number of systems connected</listitem> - <listitem>Uplink capacity</listitem> - </itemizedlist> - </para> - <section xml:id="perf.network.1switch"> - <title>Single Switch</title> - <para>The single most important factor in this configuration is that the switching capacity of the hardware is capable of - handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware - can have a slower switching capacity than could be utilized by a full switch. - </para> - </section> - <section xml:id="perf.network.2switch"> - <title>Multiple Switches</title> - <para>Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a - simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. - Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated. - </para> - <para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways: - <itemizedlist> - <listitem>Use appropriate hardware for the scale of the cluster which you're attempting to build.</listitem> - <listitem>Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port</listitem> - <listitem>Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.</listitem> - </itemizedlist> - </para> - </section> - <section xml:id="perf.network.multirack"> - <title>Multiple Racks</title> - <para>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas: - <itemizedlist> - <listitem>Poor switch capacity performance</listitem> - <listitem>Insufficient uplink to another rack</listitem> - </itemizedlist> - If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing - more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks. - The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack - A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. - </para> - <para>Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to - save your ports for machines as opposed to uplinks. - </para> - </section> - <section xml:id="perf.network.ints"> - <title>Network Interfaces</title> - <para>Are all the network interfaces functioning correctly? Are you sure? See the Troubleshooting Case Study in <xref linkend="casestudies.slownode"/>. - </para> - </section> - </section> <!-- network --> - - <section xml:id="jvm"> - <title>Java</title> - - <section xml:id="gc"> - <title>The Garbage Collector and Apache HBase</title> - - <section xml:id="gcpause"> - <title>Long GC pauses</title> - - <para xml:id="mslab">In his presentation, <link - xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding - Full GCs with MemStore-Local Allocation Buffers</link>, Todd Lipcon - describes two cases of stop-the-world garbage collections common in - HBase, especially during loading; CMS failure modes and old generation - heap fragmentation brought. To address the first, start the CMS - earlier than default by adding - <code>-XX:CMSInitiatingOccupancyFraction</code> and setting it down - from defaults. Start at 60 or 70 percent (The lower you bring down the - threshold, the more GCing is done, the more CPU used). To address the - second fragmentation issue, Todd added an experimental facility, - <indexterm><primary>MSLAB</primary></indexterm>, that - must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in - Apache 0.92.x HBase). See <code>hbase.hregion.memstore.mslab.enabled</code> - to true in your <classname>Configuration</classname>. See the cited - slides for background and detail<footnote><para>The latest jvms do better - regards fragmentation so make sure you are running a recent release. - Read down in the message, - <link xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html">Identifying concurrent mode failures caused by fragmentation</link>.</para></footnote>. - Be aware that when enabled, each MemStore instance will occupy at least - an MSLAB instance of memory. If you have thousands of regions or lots - of regions each with many column families, this allocation of MSLAB - may be responsible for a good portion of your heap allocation and in - an extreme case cause you to OOME. Disable MSLAB in this case, or - lower the amount of memory it uses or float less regions per server. - </para> - <para>If you have a write-heavy workload, check out - <link xlink:href="https://issues.apache.org/jira/browse/HBASE-8163">HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB</link>. - It describes configurations to lower the amount of young GC during write-heavy loadings. If you do not have HBASE-8163 installed, and you are - trying to improve your young GC times, one trick to consider -- courtesy of our Liang Xie -- is to set the GC config <varname>-XX:PretenureSizeThreshold</varname> in <filename>hbase-env.sh</filename> - to be just smaller than the size of <varname>hbase.hregion.memstore.mslab.chunksize</varname> so MSLAB allocations happen in the - tenured space directly rather than first in the young gen. You'd do this because these MSLAB allocations are going to likely make it - to the old gen anyways and rather than pay the price of a copies between s0 and s1 in eden space followed by the copy up from - young to old gen after the MSLABs have achieved sufficient tenure, save a bit of YGC churn and allocate in the old gen directly. - </para> - <para>For more information about GC logs, see <xref linkend="trouble.log.gc" />. - </para> - </section> - </section> - </section> - - <section xml:id="perf.configurations"> - <title>HBase Configurations</title> - - <para>See <xref linkend="recommended_configurations" />.</para> - - - <section xml:id="perf.number.of.regions"> - <title>Number of Regions</title> - - <para>The number of regions for an HBase table is driven by the <xref - linkend="bigger.regions" />. Also, see the architecture - section on <xref linkend="arch.regions.size" /></para> - </section> - - <section xml:id="perf.compactions.and.splits"> - <title>Managing Compactions</title> - - <para>For larger systems, managing <link - linkend="disable.splitting">compactions and splits</link> may be - something you want to consider.</para> - </section> - - <section xml:id="perf.handlers"> - <title><varname>hbase.regionserver.handler.count</varname></title> - <para>See <xref linkend="hbase.regionserver.handler.count"/>. - </para> - </section> - <section xml:id="perf.hfile.block.cache.size"> - <title><varname>hfile.block.cache.size</varname></title> - <para>See <xref linkend="hfile.block.cache.size"/>. - A memory setting for the RegionServer process. - </para> - </section> - <section xml:id="perf.rs.memstore.upperlimit"> - <title><varname>hbase.regionserver.global.memstore.upperLimit</varname></title> - <para>See <xref linkend="hbase.regionserver.global.memstore.upperLimit"/>. - This memory setting is often adjusted for the RegionServer process depending on needs. - </para> - </section> - <section xml:id="perf.rs.memstore.lowerlimit"> - <title><varname>hbase.regionserver.global.memstore.lowerLimit</varname></title> - <para>See <xref linkend="hbase.regionserver.global.memstore.lowerLimit"/>. - This memory setting is often adjusted for the RegionServer process depending on needs. - </para> - </section> - <section xml:id="perf.hstore.blockingstorefiles"> - <title><varname>hbase.hstore.blockingStoreFiles</varname></title> - <para>See <xref linkend="hbase.hstore.blockingStoreFiles"/>. - If there is blocking in the RegionServer logs, increasing this can help. - </para> - </section> - <section xml:id="perf.hregion.memstore.block.multiplier"> - <title><varname>hbase.hregion.memstore.block.multiplier</varname></title> - <para>See <xref linkend="hbase.hregion.memstore.block.multiplier"/>. - If there is enough RAM, increasing this can help. - </para> - </section> - <section xml:id="hbase.regionserver.checksum.verify"> - <title><varname>hbase.regionserver.checksum.verify</varname></title> - <para>Have HBase write the checksum into the datablock and save - having to do the checksum seek whenever you read.</para> - - <para>See <xref linkend="hbase.regionserver.checksum.verify"/>, - <xref linkend="hbase.hstore.bytes.per.checksum"/> and <xref linkend="hbase.hstore.checksum.algorithm"/> - For more information see the - release note on <link xlink:href="https://issues.apache.org/jira/browse/HBASE-5074">HBASE-5074 support checksums in HBase block cache</link>. - </para> - </section> - - </section> - - - - - <section xml:id="perf.zookeeper"> - <title>ZooKeeper</title> - <para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part - about having a dedicated disk. - </para> - </section> - <section xml:id="perf.schema"> - <title>Schema Design</title> - - <section xml:id="perf.number.of.cfs"> - <title>Number of Column Families</title> - <para>See <xref linkend="number.of.cfs" />.</para> - </section> - <section xml:id="perf.schema.keys"> - <title>Key and Attribute Lengths</title> - <para>See <xref linkend="keysize" />. See also <xref linkend="perf.compression.however" /> for - compression caveats.</para> - </section> - <section xml:id="schema.regionsize"><title>Table RegionSize</title> - <para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the - event where certain tables require different regionsizes than the configured default regionsize. - </para> - <para>See <xref linkend="perf.number.of.regions"/> for more information. - </para> - </section> - <section xml:id="schema.bloom"> - <title>Bloom Filters</title> - <para>Bloom Filters can be enabled per-ColumnFamily. - Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW | - ROWCOL)</code> to enable blooms per Column Family. Default = - <varname>NONE</varname> for no bloom filters. If - <varname>ROW</varname>, the hash of the row will be added to the bloom - on each insert. If <varname>ROWCOL</varname>, the hash of the row + - column family + column family qualifier will be added to the bloom on - each key insert.</para> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and - <xref linkend="blooms"/> for more information or this answer up in quora, -<link xlink:href="http://www.quora.com/How-are-bloom-filters-used-in-HBase">How are bloom filters used in HBase?</link>. - </para> - </section> - <section xml:id="schema.cf.blocksize"><title>ColumnFamily BlockSize</title> - <para>The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. - There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting - indexes should be roughly halved). - </para> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> - and <xref linkend="store"/>for more information. - </para> - </section> - <section xml:id="cf.in.memory"> - <title>In-Memory ColumnFamilies</title> - <para>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. - In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table - will be in memory. - </para> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information. - </para> - </section> - <section xml:id="perf.compression"> - <title>Compression</title> - <para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information. - </para> - <section xml:id="perf.compression.however"><title>However...</title> - <para>Compression deflates data <emphasis>on disk</emphasis>. When it's in-memory (e.g., in the - MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. - So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate - the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names. - </para> - <para>See <xref linkend="keysize" /> on for schema design tips, and <xref linkend="keyvalue"/> for more information on HBase stores data internally. - </para> - </section> - </section> - </section> <!-- perf schema --> - - <section xml:id="perf.general"> - <title>HBase General Patterns</title> - <section xml:id="perf.general.constants"> - <title>Constants</title> - <para>When people get started with HBase they have a tendency to write code that looks like this: -<programlisting> -Get get = new Get(rowkey); -Result r = htable.get(get); -byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value -</programlisting> - But especially when inside loops (and MapReduce jobs), converting the columnFamily and column-names - to byte-arrays repeatedly is surprisingly expensive. - It's better to use constants for the byte-arrays, like this: -<programlisting> -public static final byte[] CF = "cf".getBytes(); -public static final byte[] ATTR = "attr".getBytes(); -... -Get get = new Get(rowkey); -Result r = htable.get(get); -byte[] b = r.getValue(CF, ATTR); // returns current version of value -</programlisting> - </para> - </section> - - </section> - <section xml:id="perf.writing"> - <title>Writing to HBase</title> - - <section xml:id="perf.batch.loading"> - <title>Batch Loading</title> - <para>Use the bulk load tool if you can. See - <xref linkend="arch.bulk.load"/>. - Otherwise, pay attention to the below. - </para> - </section> <!-- batch loading --> - - <section xml:id="precreate.regions"> - <title> - Table Creation: Pre-Creating Regions - </title> -<para> -Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region -until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. - Be somewhat conservative in this, because too-many regions can actually degrade performance. -</para> - <para>There are two different approaches to pre-creating splits. The first approach is to rely on the default <code>HBaseAdmin</code> strategy - (which is implemented in <code>Bytes.split</code>)... - </para> -<programlisting> -byte[] startKey = ...; // your lowest keuy -byte[] endKey = ...; // your highest key -int numberOfRegions = ...; // # of regions to create -admin.createTable(table, startKey, endKey, numberOfRegions); -</programlisting> - <para>And the other approach is to define the splits yourself... - </para> -<programlisting> -byte[][] splits = ...; // create your own splits -admin.createTable(table, splits); -</programlisting> -<para> - See <xref linkend="rowkey.regionsplits"/> for issues related to understanding your keyspace and pre-creating regions. - </para> - </section> - <section xml:id="def.log.flush"> - <title> - Table Creation: Deferred Log Flush - </title> -<para> -The default behavior for Puts using the Write Ahead Log (WAL) is that <classname>HLog</classname> edits will be written immediately. If deferred log flush is used, -WAL edits are kept in memory until the flush period. The benefit is aggregated and asynchronous <classname>HLog</classname>- writes, but the potential downside is that if - the RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than not using WAL at all with Puts. -</para> -<para> -Deferred log flush can be configured on tables via <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>. The default value of <varname>hbase.regionserver.optionallogflushinterval</varname> is 1000ms. -</para> - </section> - - <section xml:id="perf.hbase.client.autoflush"> - <title>HBase Client: AutoFlush</title> - - <para>When performing a lot of Puts, make sure that setAutoFlush is set - to false on your <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> - instance. Otherwise, the Puts will be sent one at a time to the - RegionServer. Puts added via <code> htable.add(Put)</code> and <code> htable.add( <List> Put)</code> - wind up in the same write buffer. If <code>autoFlush = false</code>, - these messages are not sent until the write-buffer is filled. To - explicitly flush the messages, call <methodname>flushCommits</methodname>. - Calling <methodname>close</methodname> on the <classname>HTable</classname> - instance will invoke <methodname>flushCommits</methodname>.</para> - </section> - <section xml:id="perf.hbase.client.putwal"> - <title>HBase Client: Turn off WAL on Puts</title> - <para>A frequently discussed option for increasing throughput on <classname>Put</classname>s is to call <code>writeToWAL(false)</code>. Turning this off means - that the RegionServer will <emphasis>not</emphasis> write the <classname>Put</classname> to the Write Ahead Log, - only into the memstore, HOWEVER the consequence is that if there - is a RegionServer failure <emphasis>there will be data loss</emphasis>. - If <code>writeToWAL(false)</code> is used, do so with extreme caution. You may find in actuality that - it makes little difference if your load is well distributed across the cluster. - </para> - <para>In general, it is best to use WAL for Puts, and where loading throughput - is a concern to use <link linkend="perf.batch.loading">bulk loading</link> techniques instead. - </para> - </section> - <section xml:id="perf.hbase.client.regiongroup"> - <title>HBase Client: Group Puts by RegionServer</title> - <para>In addition to using the writeBuffer, grouping <classname>Put</classname>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush. - There is a utility <classname>HTableUtil</classname> currently on TRUNK that does this, but you can either copy that or implement your own verison for - those still on 0.90.x or earlier. - </para> - </section> - <section xml:id="perf.hbase.write.mr.reducer"> - <title>MapReduce: Skip The Reducer</title> - <para>When writing a lot of data to an HBase table from a MR job (e.g., with <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted - from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other - Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. - </para> - <para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). - This is a different processing problem than from the the above case. - </para> - </section> - - <section xml:id="perf.one.region"> - <title>Anti-Pattern: One Hot Region</title> - <para>If all your data is being written to one region at a time, then re-read the - section on processing <link linkend="timeseries">timeseries</link> data.</para> - <para>Also, if you are pre-splitting regions and all your data is <emphasis>still</emphasis> winding up in a single region even though - your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a - variety of reasons that regions may appear "well split" but won't work with your data. As - the HBase client communicates directly with the RegionServers, this can be obtained via - <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[]%29">HTable.getRegionLocation</link>. - </para> - <para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para> - </section> - - </section> <!-- writing --> - - <section xml:id="perf.reading"> - <title>Reading from HBase</title> - - <section xml:id="perf.hbase.client.caching"> - <title>Scan Caching</title> - - <para>If HBase is used as an input source for a MapReduce job, for - example, make sure that the input <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> - instance to the MapReduce job has <methodname>setCaching</methodname> set to something greater - than the default (which is 1). Using the default value means that the - map-task will make call back to the region-server for every record - processed. Setting this value to 500, for example, will transfer 500 - rows at a time to the client to be processed. There is a cost/benefit to - have the cache value be large because it costs more in memory for both - client and RegionServer, so bigger isn't always better.</para> - <section xml:id="perf.hbase.client.caching.mr"> - <title>Scan Caching in MapReduce Jobs</title> - <para>Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException) - in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the - next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process - rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes), - then set caching lower. - </para> - <para>Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the - processing that is often performed in MapReduce jobs tends to exacerbate this issue. - </para> - </section> - </section> - <section xml:id="perf.hbase.client.selection"> - <title>Scan Attribute Selection</title> - - <para>Whenever a Scan is used to process large numbers of rows (and especially when used - as a MapReduce source), be aware of which attributes are selected. If <code>scan.addFamily</code> is called - then <emphasis>all</emphasis> of the attributes in the specified ColumnFamily will be returned to the client. - If only a small number of the available attributes are to be processed, then only those attributes should be specified - in the input scan because attribute over-selection is a non-trivial performance penalty over large datasets. - </para> - </section> - <section xml:id="perf.hbase.client.seek"> - <title>Avoid scan seeks</title> - <para>When columns are selected explicitly with <code>scan.addColumn</code>, HBase will schedule seek operations to seek between the - selected columns. When rows have few columns and each column has only a few versions this can be inefficient. A seek operation is generally - slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes.</para> - <para>In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that - way before a seek operation is scheduled, a new attribute <code>Scan.HINT_LOOKAHEAD</code> can be set the on Scan object. The following code instructs the - RegionServer to attempt two iterations of next before a seek is scheduled:<programlisting> -Scan scan = new Scan(); -scan.addColumn(...); -scan.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); -table.getScanner(scan); -</programlisting></para> - </section> - <section xml:id="perf.hbase.mr.input"> - <title>MapReduce - Input Splits</title> - <para>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to - have the same Input Split (i.e., the RegionServer serving the data), see the - Troubleshooting Case Study in <xref linkend="casestudies.slownode"/>. - </para> - </section> - - <section xml:id="perf.hbase.client.scannerclose"> - <title>Close ResultScanners</title> - - <para>This isn't so much about improving performance but rather - <emphasis>avoiding</emphasis> performance problems. If you forget to - close <link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html">ResultScanners</link> - you can cause problems on the RegionServers. Always have ResultScanner - processing enclosed in try/catch blocks... <programlisting> -Scan scan = new Scan(); -// set attrs... -ResultScanner rs = htable.getScanner(scan); -try { - for (Result r = rs.next(); r != null; r = rs.next()) { - // process result... -} finally { - rs.close(); // always close the ResultScanner! -} -htable.close();</programlisting></para> - </section> - - <section xml:id="perf.hbase.client.blockcache"> - <title>Block Cache</title> - - <para><link - xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> - instances can be set to use the block cache in the RegionServer via the - <methodname>setCacheBlocks</methodname> method. For input Scans to MapReduce jobs, this should be - <varname>false</varname>. For frequently accessed rows, it is advisable to use the block - cache.</para> - </section> - <section xml:id="perf.hbase.client.rowkeyonly"> - <title>Optimal Loading of Row Keys</title> - <para>When performing a table <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">scan</link> - where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a - <varname>MUST_PASS_ALL</varname> operator to the scanner using <methodname>setFilter</methodname>. The filter list - should include both a <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html">FirstKeyOnlyFilter</link> - and a <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html">KeyOnlyFilter</link>. - Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk - and minimal network traffic to the client for a single row. - </para> - </section> - <section xml:id="perf.hbase.read.dist"> - <title>Concurrency: Monitor Data Spread</title> - <para>When performing a high number of concurrent reads, monitor the data spread of the target tables. If the target table(s) have - too few regions then the reads could likely be served from too few nodes. </para> - <para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para> - </section> - <section xml:id="blooms"> - <title>Bloom Filters</title> - <para>Enabling Bloom Filters can save your having to go to disk and - can help improve read latencys.</para> - <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200 - Add bloomfilters</link>.<footnote> - <para>For description of the development process -- why static blooms - rather than dynamic -- and for an overview of the unique properties - that pertain to blooms in HBase, as well as possible future - directions, see the <emphasis>Development Process</emphasis> section - of the document <link - xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters - in HBase</link> attached to <link - xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para> - </footnote><footnote> - <para>The bloom filters described here are actually version two of - blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom - option based on work done by the <link - xlink:href="http://www.one-lab.org">European Commission One-Lab - Project 034819</link>. The core of the HBase bloom work was later - pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. - Version 1 of HBase blooms never worked that well. Version 2 is a - rewrite from scratch though again it starts with the one-lab - work.</para> - </footnote></para> - <para>See also <xref linkend="schema.bloom" />. - </para> - - <section xml:id="bloom_footprint"> - <title>Bloom StoreFile footprint</title> - - <para>Bloom filters add an entry to the <classname>StoreFile</classname> - general <classname>FileInfo</classname> data structure and then two - extra entries to the <classname>StoreFile</classname> metadata - section.</para> - - <section> - <title>BloomFilter in the <classname>StoreFile</classname> - <classname>FileInfo</classname> data structure</title> - - <para><classname>FileInfo</classname> has a - <varname>BLOOM_FILTER_TYPE</varname> entry which is set to - <varname>NONE</varname>, <varname>ROW</varname> or - <varname>ROWCOL.</varname></para> - </section> - - <section> - <title>BloomFilter entries in <classname>StoreFile</classname> - metadata</title> - - <para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash - Function used, etc. Its small in size and is cached on - <classname>StoreFile.Reader</classname> load</para> - <para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter - data. Obtained on-demand. Stored in the LRU cache, if it is enabled - (Its enabled by default).</para> - </section> - </section> - <section xml:id="config.bloom"> - <title>Bloom Filter Configuration</title> - <section> - <title><varname>io.hfile.bloom.enabled</varname> global kill - switch</title> - - <para><code>io.hfile.bloom.enabled</code> in - <classname>Configuration</classname> serves as the kill switch in case - something goes wrong. Default = <varname>true</varname>.</para> - </section> - - <section> - <title><varname>io.hfile.bloom.error.rate</varname></title> - - <para><varname>io.hfile.bloom.error.rate</varname> = average false - positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 - bit per bloom entry.</para> - </section> - - <section> - <title><varname>io.hfile.bloom.max.fold</varname></title> - - <para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum - fold rate. Most people should leave this alone. Default = 7, or can - collapse to at least 1/128th of original size. See the - <emphasis>Development Process</emphasis> section of the document <link - xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters - in HBase</link> for more on what this option means.</para> - </section> - </section> - </section> <!-- bloom --> - - </section> <!-- reading --> - - <section xml:id="perf.deleting"> - <title>Deleting from HBase</title> - <section xml:id="perf.deleting.queue"> - <title>Using HBase Tables as Queues</title> - <para>HBase tables are sometimes used as queues. In this case, special care must be taken to regularly perform major compactions on tables used in - this manner. As is documented in <xref linkend="datamodel" />, marking rows as deleted creates additional StoreFiles which then need to be processed - on reads. Tombstones only get cleaned up with major compactions. - </para> - <para>See also <xref linkend="compaction" /> and <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>. - </para> - </section> - <section xml:id="perf.deleting.rpc"> - <title>Delete RPC Behavior</title> - <para>Be aware that <code>htable.delete(Delete)</code> doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation. - For a large number of deletes, consider <code>htable.delete(List)</code>. - </para> - <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"></link> - </para> - </section> - </section> <!-- deleting --> - - <section xml:id="perf.hdfs"><title>HDFS</title> - <para>Because HBase runs on <xref linkend="arch.hdfs" /> it is important to understand how it works and how it affects - HBase. - </para> - <section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency Reads</title> - <para>The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority. - With the increased adoption of Apache HBase this is changing, and several improvements are already in development. - See the - <link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>. - </para> - </section> - <section xml:id="perf.hdfs.configs.localread"> - <title>Leveraging local data</title> -<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via -<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>, -it is possible for the DFSClient to take a "short circuit" and -read directly from disk instead of going through the DataNode when the -data is local. What this means for HBase is that the RegionServers can -read directly off their machine's disks instead of having to open a -socket to talk to the DataNode, the former being generally much -faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>. -Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for -more discussion around short circuit reads. -</para> -<para>To enable "short circuit" reads, it will depend on your version of Hadoop. - The original shortcircuit read patch was much improved upon in Hadoop 2 in - <link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link>. - See <link xlink:href="http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/"></link> for details - on the difference between the old and new implementations. See - <link xlink:href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Hadoop shortcircuit reads configuration page</link> - for how to enable the later version of shortcircuit. -</para> -<para>If you are running on an old Hadoop, one that is without - <link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that - has -<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>, -you must set two configurations. -First, the hdfs-site.xml needs to be amended. Set -the property <varname>dfs.block.local-path-access.user</varname> -to be the <emphasis>only</emphasis> user that can use the shortcut. -This has to be the user that started HBase. Then in hbase-site.xml, -set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname> -</para> -<para> - For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. - To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into - its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />. When both - local short-circuit reads and hbase level checksums are enabled, you SHOULD NOT disable configuration parameter - "dfs.client.read.shortcircuit.skip.checksum", which will cause skipping checksum on non-hfile reads. HBase already - manages that setting under the covers. -</para> -<para> -The DataNodes need to be restarted in order to pick up the new -configuration. Be aware that if a process started under another -username than the one configured here also has the shortcircuit -enabled, it will get an Exception regarding an unauthorized access but -the data will still be read. -</para> -<note xml:id="dfs.client.read.shortcircuit.buffer.size"> - <title>dfs.client.read.shortcircuit.buffer.size</title> - <para>The default for this value is too high when running on a highly trafficed HBase. Set it down from its - 1M default down to 128k or so. Put this configuration in the HBase configs (its a HDFS client-side configuration). - The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for <emphasis>each</emphasis> - block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.</para> -</note> - </section> - - <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title> - <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as - a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, - returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this - processing context. There is room for improvement and this gap will, over time, be reduced, but HDFS - will always be faster in this use-case. - </para> - </section> - </section> - - <section xml:id="perf.ec2"><title>Amazon EC2</title> - <para>Performance questions are common on Amazon EC2 environments because it is a shared environment. You will - not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same - reason (i.e., it's a shared environment and you don't know what else is happening on the server). - </para> - <para>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that - because EC2 issues are practically a separate class of performance issues. - </para> - </section> - - <section xml:id="perf.hbase.mr.cluster"><title>Collocating HBase and MapReduce</title> - <para>It is often recommended to have different clusters for HBase and MapReduce. A better qualification of this is: - don't collocate a HBase that serves live requests with a heavy MR workload. OLTP and OLAP-optimized systems have - conflicting requirements and one will lose to the other, usually the former. For example, short latency-sensitive - disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as - possible. MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate - blocks in the <xref linkend="block.cache"/>. - </para> - <para>If you need to process the data from your live HBase cluster in MR, you can ship the deltas with <xref linkend="copy.table"/> - or use replication to get the new data in real time on the OLAP cluster. In the worst case, if you really need to - collocate both, set MR to use less Map and Reduce slots than you'd normally configure, possibly just one. - </para> - <para>When HBase is used for OLAP operations, it's preferable to set it up in a hardened way like configuring the ZooKeeper session - timeout higher and giving more memory to the MemStores (the argument being that the Block Cache won't be used much - since the workloads are usually long scans). - </para> - </section> - - <section xml:id="perf.casestudy"><title>Case Studies</title> - <para>For Performance and Troubleshooting Case Studies, see <xref linkend="casestudies"/>. - </para> - </section> -</chapter>
http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/preface.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/preface.xml b/src/main/docbkx/preface.xml deleted file mode 100644 index 5308037..0000000 --- a/src/main/docbkx/preface.xml +++ /dev/null @@ -1,70 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<preface version="5.0" xml:id="preface" xmlns="http://docbook.org/ns/docbook" - xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:xi="http://www.w3.org/2001/XInclude" - xmlns:svg="http://www.w3.org/2000/svg" - xmlns:m="http://www.w3.org/1998/Math/MathML" - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:db="http://docbook.org/ns/docbook"> -<!-- -/** - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ ---> - <title>Preface</title> - - <para>This is the official reference guide for the <link - xlink:href="http://hbase.apache.org/">HBase</link> version it ships with. - Herein you will find either the definitive documentation on an HBase topic - as of its standing when the referenced HBase version shipped, or it - will point to the location in <link - xlink:href="http://hbase.apache.org/apidocs/index.html">javadoc</link>, - <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link> - or <link xlink:href="http://wiki.apache.org/hadoop/Hbase">wiki</link> where - the pertinent information can be found.</para> - - <para>This reference guide is a work in progress. The source for this guide can - be found at <filename>src/main/docbkx</filename> in a checkout of the hbase - project. This reference guide is marked up using - <link xlink:href="http://www.docbook.com/">DocBook</link> from which the - the finished guide is generated as part of the 'site' build target. Run - <programlisting>mvn site</programlisting> to generate this documentation. - Amendments and improvements to the documentation are welcomed. Add a - patch to an issue up in the HBase <link - xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para> - - <note xml:id="headsup"> - <title>Heads-up</title> - <para> - If this is your first foray into the wonderful world of - Distributed Computing, then you are in for - some interesting times. First off, distributed systems are - hard; making a distributed system hum requires a disparate - skillset that spans systems (hardware and software) and - networking. Your cluster' operation can hiccup because of any - of a myriad set of reasons from bugs in HBase itself through misconfigurations - -- misconfiguration of HBase but also operating system misconfigurations -- - through to hardware problems whether it be a bug in your network card - drivers or an underprovisioned RAM bus (to mention two recent - examples of hardware issues that manifested as "HBase is slow"). - You will also need to do a recalibration if up to this your - computing has been bound to a single box. Here is one good - starting point: - <link xlink:href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">Fallacies of Distributed Computing</link>. - </para> - </note> -</preface> http://git-wip-us.apache.org/repos/asf/hbase/blob/7bf6c024/src/main/docbkx/rpc.xml ---------------------------------------------------------------------- diff --git a/src/main/docbkx/rpc.xml b/src/main/docbkx/rpc.xml deleted file mode 100644 index cbc59b7..0000000 --- a/src/main/docbkx/rpc.xml +++ /dev/null @@ -1,236 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<appendix xml:id="hbase.rpc" - version="5.0" xmlns="http://docbook.org/ns/docbook" - xmlns:xlink="http://www.w3.org/1999/xlink" - xmlns:xi="http://www.w3.org/2001/XInclude" - xmlns:svg="http://www.w3.org/2000/svg" - xmlns:m="http://www.w3.org/1998/Math/MathML" - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:db="http://docbook.org/ns/docbook"> - <!--/** - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ ---> - - <title>0.95 RPC Specification</title> - <para>In 0.95, all client/server communication is done with - <link xlink:href="https://code.google.com/p/protobuf/">protobufâed</link> Messages rather than with - <link xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html">Hadoop Writables</link>. - Our RPC wire format therefore changes. - This document describes the client/server request/response protocol and our new RPC wire-format.</para> - <para/> - <para>For what RPC is like in 0.94 and previous, - see Benoît/Tsunaâs <link xlink:href="https://github.com/OpenTSDB/asynchbase/blob/master/src/HBaseRpc.java#L164">Unofficial Hadoop / HBase RPC protocol documentation</link>. - For more background on how we arrived at this spec., see - <link xlink:href="https://docs.google.com/document/d/1WCKwgaLDqBw2vpux0jPsAu2WPTRISob7HGCO8YhfDTA/edit#">HBase RPC: WIP</link></para> - <para/> - <section><title>Goals</title> - <para> - <orderedlist> - <listitem> - <para>A wire-format we can evolve</para> - </listitem> - <listitem> - <para>A format that does not require our rewriting server core or - radically changing its current architecture (for later).</para> - </listitem> - </orderedlist> - </para> - </section> - <section><title>TODO</title> - <para> - <orderedlist> - <listitem> - <para>List of problems with currently specified format and where - we would like to go in a version2, etc. For example, what would we - have to change if anything to move server async or to support - streaming/chunking?</para> - </listitem> - <listitem> - <para>Diagram on how it works</para> - </listitem> - <listitem> - <para>A grammar that succinctly describes the wire-format. Currently - we have these words and the content of the rpc protobuf idl but - a grammar for the back and forth would help with groking rpc. Also, - a little state machine on client/server interactions would help - with understanding (and ensuring correct implementation).</para> - </listitem> - </orderedlist> - </para> - </section> - <section><title>RPC</title> - <para>The client will send setup information on connection establish. - Thereafter, the client invokes methods against the remote server sending a protobuf Message and receiving a protobuf Message in response. - Communication is synchronous. All back and forth is preceded by an int that has the total length of the request/response. - Optionally, Cells(KeyValues) can be passed outside of protobufs in follow-behind Cell blocks (because - <link xlink:href="https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#">we canât protobuf megabytes of KeyValues</link> or Cells). - These CellBlocks are encoded and optionally compressed.</para> - <para/> - <para>For more detail on the protobufs involved, see the - <link xlink:href="http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RPC.proto?view=markup">RPC.proto</link> file in trunk.</para> - - <section> - <title>Connection Setup</title> - <para>Client initiates connection.</para> - <section><title>Client</title> - <para>On connection setup, client sends a preamble followed by a connection header. - </para> - - <section> - <title><preamble></title> - <para><programlisting><MAGIC 4 byte integer> <1 byte RPC Format Version> <1 byte auth type><footnote><para> We need the auth method spec. here so the connection header is encoded if auth enabled.</para></footnote></programlisting></para> - <para>E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- âHBasâ -- plus one-byte of version, 0 in this case, and one byte, 0x50 (SIMPLE). of an auth type.</para> - </section> - - <section> - <title><Protobuf ConnectionHeader Message></title> - <para>Has user info, and âprotocolâ, as well as the encoders and compression the client will use sending CellBlocks. - CellBlock encoders and compressors are for the life of the connection. - CellBlock encoders implement org.apache.hadoop.hbase.codec.Codec. - CellBlocks may then also be compressed. - Compressors implement org.apache.hadoop.io.compress.CompressionCodec. - This protobuf is written using writeDelimited so is prefaced by a pb varint - with its serialized length</para> - </section> - </section><!--Client--> - - <section><title>Server</title> - <para>After client sends preamble and connection header, - server does NOT respond if successful connection setup. - No response means server is READY to accept requests and to give out response. - If the version or authentication in the preamble is not agreeable or the server has trouble parsing the preamble, - it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the error and will then disconnect. - If the client in the connection header -- i.e. the protobufâd Message that comes after the connection preamble -- asks for for a - Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.</para> - </section> - </section> - - <section><title>Request</title> - <para>After a Connection has been set up, client makes requests. Server responds.</para> - <para>A request is made up of a protobuf RequestHeader followed by a protobuf Message parameter. - The header includes the method name and optionally, metadata on the optional CellBlock that may be following. - The parameter type suits the method being invoked: i.e. if we are doing a getRegionInfo request, - the protobuf Message param will be an instance of GetRegionInfoRequest. - The response will be a GetRegionInfoResponse. - The CellBlock is optionally used ferrying the bulk of the RPC data: i.e Cells/KeyValues.</para> - <para/> - <section><title>Request Parts</title> - <section><title><Total Length></title> - <para>The request is prefaced by an int that holds the total length of what follows.</para> - </section> - <section><title><Protobuf RequestHeader Message></title> - <para>Will have call.id, trace.id, and method name, etc. including optional Metadata on the Cell block IFF one is following. - Data is protobufâd inline in this pb Message or optionally comes in the following CellBlock</para> - </section> - <section><title><Protobuf Param Message></title> - <para>If the method being invoked is getRegionInfo, if you study the Service descriptor for the client to regionserver protocol, - you will find that the request sends a GetRegionInfoRequest protobuf Message param in this position.</para> - </section> - <section><title><CellBlock></title> - <para>An encoded and optionally compressed Cell block.</para> - </section> - </section><!--Request parts--> - </section><!--Request--> - - <section><title>Response</title> - <para>Same as Request, it is a protobuf ResponseHeader followed by a protobuf Message response where the Message response type suits the method invoked. - Bulk of the data may come in a following CellBlock.</para> - <section><title>Response Parts</title> - <section><title><Total Length></title> - <para>The response is prefaced by an int that holds the total length of what follows.</para> - </section> - <section><title><Protobuf ResponseHeader Message></title> - <para>Will have call.id, etc. Will include exception if failed processing.  Optionally includes metadata on optional, IFF there is a CellBlock following.</para> - </section> - - <section><title><Protobuf Response Message></title> - <para>Return or may be nothing if exception. If the method being invoked is getRegionInfo, if you study the Service descriptor for the client to regionserver protocol, - you will find that the response sends a GetRegionInfoResponse protobuf Message param in this position.</para> - </section> - <section><title><CellBlock></title> - <para>An encoded and optionally compressed Cell block.</para> - </section> - </section><!--Parts--> - </section><!--Response--> - - <section><title>Exceptions</title> - <para>There are two distinct types. - There is the request failed which is encapsulated inside the response header for the response. - The connection stays open to receive new requests. - The second type, the FatalConnectionException, kills the connection.</para> - <para>Exceptions can carry extra information. - See the ExceptionResponse protobuf type. - It has a flag to indicate do-no-retry as well as other miscellaneous payload to help improve client responsiveness.</para> - </section> - <section><title>CellBlocks</title> - <para>These are not versioned. - Server can do the codec or it cannot. - If new version of a codec with say, tighter encoding, then give it a new class name. - Codecs will live on the server for all time so old clients can connect.</para> - </section> - </section> - - - <section><title>Notes</title> - <section><title>Constraints</title> - <para>In some part, current wire-format -- i.e. all requests and responses preceeded by a length -- has been dictated by current server non-async architecture.</para> - </section> - <section><title>One fat pb request or header+param</title> - <para>We went with pb header followed by pb param making a request and a pb header followed by pb response for now. - Doing header+param rather than a single protobuf Message with both header and param content:</para> - <para> - <orderedlist> - <listitem> - <para>Is closer to what we currently have</para> - </listitem> - <listitem> - <para>Having a single fat pb requires extra copying putting the already pbâd param into the body of the fat request pb (and same making result)</para> - </listitem> - <listitem> - <para>We can decide whether to accept the request or not before we read the param; for example, the request might be low priority.  As is, we read header+param in one go as server is currently implemented so this is a TODO.</para> - </listitem> - </orderedlist> - </para> - <para>The advantages are minor.  If later, fat request has clear advantage, can roll out a v2 later.</para> - </section> - <section xml:id="rpc.configs"><title>RPC Configurations</title> - <section><title>CellBlock Codecs</title> - <para>To enable a codec other than the default <classname>KeyValueCodec</classname>, - set <varname>hbase.client.rpc.codec</varname> - to the name of the Codec class to use. Codec must implement hbase's <classname>Codec</classname> Interface. After connection setup, - all passed cellblocks will be sent with this codec. The server will return cellblocks using this same codec as long - as the codec is on the servers' CLASSPATH (else you will get <classname>UnsupportedCellCodecException</classname>).</para> - <para>To change the default codec, set <varname>hbase.client.default.rpc.codec</varname>. - </para> - <para>To disable cellblocks completely and to go pure protobuf, set the default to the - empty String and do not specify a codec in your Configuration. So, set <varname>hbase.client.default.rpc.codec</varname> - to the empty string and do not set <varname>hbase.client.rpc.codec</varname>. - This will cause the client to connect to the server with no codec specified. - If a server sees no codec, it will return all responses in pure protobuf. - Running pure protobuf all the time will be slower than running with cellblocks. - </para> - </section> - <section><title>Compression</title> - <para>Uses hadoops compression codecs. To enable compressing of passed CellBlocks, set <varname>hbase.client.rpc.compressor</varname> - to the name of the Compressor to use. Compressor must implement Hadoops' CompressionCodec Interface. After connection setup, - all passed cellblocks will be sent compressed. The server will return cellblocks compressed using this same compressor as long - as the compressor is on its CLASSPATH (else you will get <classname>UnsupportedCompressionCodecException</classname>).</para> - </section> - </section> - </section> -</appendix>
