http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/hbase-default.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc b/src/main/asciidoc/_chapters/hbase-default.adoc index 8df9b17..26929a3 100644 --- a/src/main/asciidoc/_chapters/hbase-default.adoc +++ b/src/main/asciidoc/_chapters/hbase-default.adoc @@ -46,7 +46,7 @@ Temporary directory on the local filesystem. .Default `${java.io.tmpdir}/hbase-${user.name}` - + [[hbase.rootdir]] *`hbase.rootdir`*:: + @@ -64,7 +64,7 @@ The directory shared by region servers and into .Default `${hbase.tmp.dir}/hbase` - + [[hbase.cluster.distributed]] *`hbase.cluster.distributed`*:: + @@ -77,7 +77,7 @@ The mode the cluster will be in. Possible values are .Default `false` - + [[hbase.zookeeper.quorum]] *`hbase.zookeeper.quorum`*:: + @@ -97,7 +97,7 @@ Comma separated list of servers in the ZooKeeper ensemble .Default `localhost` - + [[hbase.local.dir]] *`hbase.local.dir`*:: + @@ -108,7 +108,7 @@ Directory on the local filesystem to be used .Default `${hbase.tmp.dir}/local/` - + [[hbase.master.info.port]] *`hbase.master.info.port`*:: + @@ -119,18 +119,18 @@ The port for the HBase Master web UI. .Default `16010` - + [[hbase.master.info.bindAddress]] *`hbase.master.info.bindAddress`*:: + .Description The bind address for the HBase Master web UI - + + .Default `0.0.0.0` - + [[hbase.master.logcleaner.plugins]] *`hbase.master.logcleaner.plugins`*:: + @@ -145,7 +145,7 @@ A comma-separated list of BaseLogCleanerDelegate invoked by .Default `org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner` - + [[hbase.master.logcleaner.ttl]] *`hbase.master.logcleaner.ttl`*:: + @@ -156,7 +156,7 @@ Maximum time a WAL can stay in the .oldlogdir directory, .Default `600000` - + [[hbase.master.hfilecleaner.plugins]] *`hbase.master.hfilecleaner.plugins`*:: + @@ -172,7 +172,7 @@ A comma-separated list of BaseHFileCleanerDelegate invoked by .Default `org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner` - + [[hbase.master.catalog.timeout]] *`hbase.master.catalog.timeout`*:: + @@ -183,7 +183,7 @@ Timeout value for the Catalog Janitor from the master to .Default `600000` - + [[hbase.master.infoserver.redirect]] *`hbase.master.infoserver.redirect`*:: + @@ -195,7 +195,7 @@ Whether or not the Master listens to the Master web .Default `true` - + [[hbase.regionserver.port]] *`hbase.regionserver.port`*:: + @@ -205,7 +205,7 @@ The port the HBase RegionServer binds to. .Default `16020` - + [[hbase.regionserver.info.port]] *`hbase.regionserver.info.port`*:: + @@ -216,7 +216,7 @@ The port for the HBase RegionServer web UI .Default `16030` - + [[hbase.regionserver.info.bindAddress]] *`hbase.regionserver.info.bindAddress`*:: + @@ -226,7 +226,7 @@ The address for the HBase RegionServer web UI .Default `0.0.0.0` - + [[hbase.regionserver.info.port.auto]] *`hbase.regionserver.info.port.auto`*:: + @@ -239,7 +239,7 @@ Whether or not the Master or RegionServer .Default `false` - + [[hbase.regionserver.handler.count]] *`hbase.regionserver.handler.count`*:: + @@ -250,7 +250,7 @@ Count of RPC Listener instances spun up on RegionServers. .Default `30` - + [[hbase.ipc.server.callqueue.handler.factor]] *`hbase.ipc.server.callqueue.handler.factor`*:: + @@ -262,7 +262,7 @@ Factor to determine the number of call queues. .Default `0.1` - + [[hbase.ipc.server.callqueue.read.ratio]] *`hbase.ipc.server.callqueue.read.ratio`*:: + @@ -287,12 +287,12 @@ Split the call queues into read and write queues. and 2 queues will contain only write requests. a read.ratio of 1 means that: 9 queues will contain only read requests and 1 queues will contain only write requests. - + + .Default `0` - + [[hbase.ipc.server.callqueue.scan.ratio]] *`hbase.ipc.server.callqueue.scan.ratio`*:: + @@ -313,12 +313,12 @@ Given the number of read call queues, calculated from the total number and 4 queues will contain only short-read requests. a scan.ratio of 0.8 means that: 6 queues will contain only long-read requests and 2 queues will contain only short-read requests. - + + .Default `0` - + [[hbase.regionserver.msginterval]] *`hbase.regionserver.msginterval`*:: + @@ -329,7 +329,7 @@ Interval between messages from the RegionServer to Master .Default `3000` - + [[hbase.regionserver.regionSplitLimit]] *`hbase.regionserver.regionSplitLimit`*:: + @@ -342,7 +342,7 @@ Limit for the number of regions after which no more region .Default `2147483647` - + [[hbase.regionserver.logroll.period]] *`hbase.regionserver.logroll.period`*:: + @@ -353,7 +353,7 @@ Period at which we will roll the commit log regardless .Default `3600000` - + [[hbase.regionserver.logroll.errors.tolerated]] *`hbase.regionserver.logroll.errors.tolerated`*:: + @@ -367,7 +367,7 @@ The number of consecutive WAL close errors we will allow .Default `2` - + [[hbase.regionserver.hlog.reader.impl]] *`hbase.regionserver.hlog.reader.impl`*:: + @@ -377,7 +377,7 @@ The WAL file reader implementation. .Default `org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader` - + [[hbase.regionserver.hlog.writer.impl]] *`hbase.regionserver.hlog.writer.impl`*:: + @@ -387,7 +387,7 @@ The WAL file writer implementation. .Default `org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter` - + [[hbase.master.distributed.log.replay]] *`hbase.master.distributed.log.replay`*:: + @@ -397,13 +397,13 @@ Enable 'distributed log replay' as default engine splitting back to the old mode 'distributed log splitter', set the value to 'false'. 'Disributed log replay' improves MTTR because it does not write intermediate files. 'DLR' required that 'hfile.format.version' - be set to version 3 or higher. - + be set to version 3 or higher. + + .Default `true` - + [[hbase.regionserver.global.memstore.size]] *`hbase.regionserver.global.memstore.size`*:: + @@ -416,20 +416,20 @@ Maximum size of all memstores in a region server before new .Default `0.4` - + [[hbase.regionserver.global.memstore.size.lower.limit]] *`hbase.regionserver.global.memstore.size.lower.limit`*:: + .Description Maximum size of all memstores in a region server before flushes are forced. Defaults to 95% of hbase.regionserver.global.memstore.size. - A 100% value for this value causes the minimum possible flushing to occur when updates are + A 100% value for this value causes the minimum possible flushing to occur when updates are blocked due to memstore limiting. + .Default `0.95` - + [[hbase.regionserver.optionalcacheflushinterval]] *`hbase.regionserver.optionalcacheflushinterval`*:: + @@ -441,7 +441,7 @@ Maximum size of all memstores in a region server before flushes are forced. .Default `3600000` - + [[hbase.regionserver.catalog.timeout]] *`hbase.regionserver.catalog.timeout`*:: + @@ -451,7 +451,7 @@ Timeout value for the Catalog Janitor from the regionserver to META. .Default `600000` - + [[hbase.regionserver.dns.interface]] *`hbase.regionserver.dns.interface`*:: + @@ -462,7 +462,7 @@ The name of the Network Interface from which a region server .Default `default` - + [[hbase.regionserver.dns.nameserver]] *`hbase.regionserver.dns.nameserver`*:: + @@ -474,7 +474,7 @@ The host name or IP address of the name server (DNS) .Default `default` - + [[hbase.regionserver.region.split.policy]] *`hbase.regionserver.region.split.policy`*:: + @@ -483,12 +483,12 @@ The host name or IP address of the name server (DNS) A split policy determines when a region should be split. The various other split policies that are available currently are ConstantSizeRegionSplitPolicy, DisabledRegionSplitPolicy, DelimitedKeyPrefixRegionSplitPolicy, KeyPrefixRegionSplitPolicy etc. - + + .Default `org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy` - + [[zookeeper.session.timeout]] *`zookeeper.session.timeout`*:: + @@ -497,17 +497,18 @@ ZooKeeper session timeout in milliseconds. It is used in two different ways. First, this value is used in the ZK client that HBase uses to connect to the ensemble. It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions. - For example, if a HBase region server connects to a ZK ensemble that's also managed by HBase, then the + For example, if an HBase region server connects to a ZK ensemble that's also managed + by HBase, then the session timeout will be the one specified by this configuration. But, a region server that connects to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and it will take precedence. The current default that ZK ships with is 40 seconds, which is lower than HBase's. - + + .Default `90000` - + [[zookeeper.znode.parent]] *`zookeeper.znode.parent`*:: + @@ -520,7 +521,7 @@ Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper .Default `/hbase` - + [[zookeeper.znode.rootserver]] *`zookeeper.znode.rootserver`*:: + @@ -533,7 +534,7 @@ Path to ZNode holding root region location. This is written by .Default `root-region-server` - + [[zookeeper.znode.acl.parent]] *`zookeeper.znode.acl.parent`*:: + @@ -543,7 +544,7 @@ Root ZNode for access control lists. .Default `acl` - + [[hbase.zookeeper.dns.interface]] *`hbase.zookeeper.dns.interface`*:: + @@ -554,7 +555,7 @@ The name of the Network Interface from which a ZooKeeper server .Default `default` - + [[hbase.zookeeper.dns.nameserver]] *`hbase.zookeeper.dns.nameserver`*:: + @@ -566,7 +567,7 @@ The host name or IP address of the name server (DNS) .Default `default` - + [[hbase.zookeeper.peerport]] *`hbase.zookeeper.peerport`*:: + @@ -578,7 +579,7 @@ Port used by ZooKeeper peers to talk to each other. .Default `2888` - + [[hbase.zookeeper.leaderport]] *`hbase.zookeeper.leaderport`*:: + @@ -590,7 +591,7 @@ Port used by ZooKeeper for leader election. .Default `3888` - + [[hbase.zookeeper.useMulti]] *`hbase.zookeeper.useMulti`*:: + @@ -616,7 +617,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `10` - + [[hbase.zookeeper.property.syncLimit]] *`hbase.zookeeper.property.syncLimit`*:: + @@ -628,7 +629,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `5` - + [[hbase.zookeeper.property.dataDir]] *`hbase.zookeeper.property.dataDir`*:: + @@ -639,7 +640,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `${hbase.tmp.dir}/zookeeper` - + [[hbase.zookeeper.property.clientPort]] *`hbase.zookeeper.property.clientPort`*:: + @@ -650,7 +651,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `2181` - + [[hbase.zookeeper.property.maxClientCnxns]] *`hbase.zookeeper.property.maxClientCnxns`*:: + @@ -664,7 +665,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `300` - + [[hbase.client.write.buffer]] *`hbase.client.write.buffer`*:: + @@ -679,7 +680,7 @@ Default size of the HTable client write buffer in bytes. .Default `2097152` - + [[hbase.client.pause]] *`hbase.client.pause`*:: + @@ -692,7 +693,7 @@ General client pause value. Used mostly as value to wait .Default `100` - + [[hbase.client.retries.number]] *`hbase.client.retries.number`*:: + @@ -707,7 +708,7 @@ Maximum retries. Used as maximum for all retryable .Default `35` - + [[hbase.client.max.total.tasks]] *`hbase.client.max.total.tasks`*:: + @@ -718,7 +719,7 @@ The maximum number of concurrent tasks a single HTable instance will .Default `100` - + [[hbase.client.max.perserver.tasks]] *`hbase.client.max.perserver.tasks`*:: + @@ -729,7 +730,7 @@ The maximum number of concurrent tasks a single HTable instance will .Default `5` - + [[hbase.client.max.perregion.tasks]] *`hbase.client.max.perregion.tasks`*:: + @@ -742,7 +743,7 @@ The maximum number of concurrent connections the client will .Default `1` - + [[hbase.client.scanner.caching]] *`hbase.client.scanner.caching`*:: + @@ -757,7 +758,7 @@ Number of rows that will be fetched when calling next .Default `100` - + [[hbase.client.keyvalue.maxsize]] *`hbase.client.keyvalue.maxsize`*:: + @@ -772,7 +773,7 @@ Specifies the combined maximum allowed size of a KeyValue .Default `10485760` - + [[hbase.client.scanner.timeout.period]] *`hbase.client.scanner.timeout.period`*:: + @@ -782,7 +783,7 @@ Client scanner lease period in milliseconds. .Default `60000` - + [[hbase.client.localityCheck.threadPoolSize]] *`hbase.client.localityCheck.threadPoolSize`*:: + @@ -792,7 +793,7 @@ Client scanner lease period in milliseconds. .Default `2` - + [[hbase.bulkload.retries.number]] *`hbase.bulkload.retries.number`*:: + @@ -804,7 +805,7 @@ Maximum retries. This is maximum number of iterations .Default `10` - + [[hbase.balancer.period ]] *`hbase.balancer.period @@ -816,7 +817,7 @@ Period at which the region balancer runs in the Master. .Default `300000` - + [[hbase.regions.slop]] *`hbase.regions.slop`*:: + @@ -826,7 +827,7 @@ Rebalance if any regionserver has average + (average * slop) regions. .Default `0.2` - + [[hbase.server.thread.wakefrequency]] *`hbase.server.thread.wakefrequency`*:: + @@ -837,20 +838,20 @@ Time to sleep in between searches for work (in milliseconds). .Default `10000` - + [[hbase.server.versionfile.writeattempts]] *`hbase.server.versionfile.writeattempts`*:: + .Description How many time to retry attempting to write a version file - before just aborting. Each attempt is seperated by the + before just aborting. Each attempt is separated by the hbase.server.thread.wakefrequency milliseconds. + .Default `3` - + [[hbase.hregion.memstore.flush.size]] *`hbase.hregion.memstore.flush.size`*:: + @@ -863,7 +864,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `134217728` - + [[hbase.hregion.percolumnfamilyflush.size.lower.bound]] *`hbase.hregion.percolumnfamilyflush.size.lower.bound`*:: + @@ -876,12 +877,12 @@ Time to sleep in between searches for work (in milliseconds). memstore size more than this, all the memstores will be flushed (just as usual). This value should be less than half of the total memstore threshold (hbase.hregion.memstore.flush.size). - + + .Default `16777216` - + [[hbase.hregion.preclose.flush.size]] *`hbase.hregion.preclose.flush.size`*:: + @@ -900,7 +901,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `5242880` - + [[hbase.hregion.memstore.block.multiplier]] *`hbase.hregion.memstore.block.multiplier`*:: + @@ -916,7 +917,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `4` - + [[hbase.hregion.memstore.mslab.enabled]] *`hbase.hregion.memstore.mslab.enabled`*:: + @@ -930,19 +931,19 @@ Time to sleep in between searches for work (in milliseconds). .Default `true` - + [[hbase.hregion.max.filesize]] *`hbase.hregion.max.filesize`*:: + .Description - Maximum HFile size. If the sum of the sizes of a region's HFiles has grown to exceed this + Maximum HFile size. If the sum of the sizes of a region's HFiles has grown to exceed this value, the region is split in two. + .Default `10737418240` - + [[hbase.hregion.majorcompaction]] *`hbase.hregion.majorcompaction`*:: + @@ -959,7 +960,7 @@ Time between major compactions, expressed in milliseconds. Set to 0 to disable .Default `604800000` - + [[hbase.hregion.majorcompaction.jitter]] *`hbase.hregion.majorcompaction.jitter`*:: + @@ -972,32 +973,32 @@ A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occ .Default `0.50` - + [[hbase.hstore.compactionThreshold]] *`hbase.hstore.compactionThreshold`*:: + .Description - If more than this number of StoreFiles exist in any one Store - (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all + If more than this number of StoreFiles exist in any one Store + (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay compaction, but when compaction does occur, it takes longer to complete. + .Default `3` - + [[hbase.hstore.flusher.count]] *`hbase.hstore.flusher.count`*:: + .Description The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on - HDFS, and potentially causing more compactions. + HDFS, and potentially causing more compactions. + .Default `2` - + [[hbase.hstore.blockingStoreFiles]] *`hbase.hstore.blockingStoreFiles`*:: + @@ -1009,40 +1010,40 @@ A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occ .Default `10` - + [[hbase.hstore.blockingWaitTime]] *`hbase.hstore.blockingWaitTime`*:: + .Description The time for which a region will block updates after reaching the StoreFile limit - defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop + defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop blocking updates even if a compaction has not been completed. + .Default `90000` - + [[hbase.hstore.compaction.min]] *`hbase.hstore.compaction.min`*:: + .Description -The minimum number of StoreFiles which must be eligible for compaction before - compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid ending up with - too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction +The minimum number of StoreFiles which must be eligible for compaction before + compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid ending up with + too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you - set this value too high, all the other values will need to be adjusted accordingly. For most + set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In previous versions of HBase, the parameter hbase.hstore.compaction.min was named hbase.hstore.compactionThreshold. + .Default `3` - + [[hbase.hstore.compaction.max]] *`hbase.hstore.compaction.max`*:: + .Description -The maximum number of StoreFiles which will be selected for a single minor +The maximum number of StoreFiles which will be selected for a single minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the length of time it takes a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most @@ -1051,88 +1052,88 @@ The maximum number of StoreFiles which will be selected for a single minor .Default `10` - + [[hbase.hstore.compaction.min.size]] *`hbase.hstore.compaction.min.size`*:: + .Description -A StoreFile smaller than this size will always be eligible for minor compaction. - HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if - they are eligible. Because this limit represents the "automatic include"limit for all - StoreFiles smaller than this value, this value may need to be reduced in write-heavy - environments where many StoreFiles in the 1-2 MB range are being flushed, because every +A StoreFile smaller than this size will always be eligible for minor compaction. + HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if + they are eligible. Because this limit represents the "automatic include"limit for all + StoreFiles smaller than this value, this value may need to be reduced in write-heavy + environments where many StoreFiles in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is - triggered more quickly. This addressed some issues seen in earlier versions of HBase but - changing this parameter is no longer necessary in most situations. Default: 128 MB expressed + triggered more quickly. This addressed some issues seen in earlier versions of HBase but + changing this parameter is no longer necessary in most situations. Default: 128 MB expressed in bytes. + .Default `134217728` - + [[hbase.hstore.compaction.max.size]] *`hbase.hstore.compaction.max.size`*:: + .Description -A StoreFile larger than this size will be excluded from compaction. The effect of - raising hbase.hstore.compaction.max.size is fewer, larger StoreFiles that do not get +A StoreFile larger than this size will be excluded from compaction. The effect of + raising hbase.hstore.compaction.max.size is fewer, larger StoreFiles that do not get compacted often. If you feel that compaction is happening too often without much benefit, you can try raising this value. Default: the value of LONG.MAX_VALUE, expressed in bytes. + .Default `9223372036854775807` - + [[hbase.hstore.compaction.ratio]] *`hbase.hstore.compaction.ratio`*:: + .Description -For minor compaction, this ratio is used to determine whether a given StoreFile +For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than hbase.hstore.compaction.min.size is eligible for compaction. Its effect is to limit compaction of large StoreFiles. The value of hbase.hstore.compaction.ratio - is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single - giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the + is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single + giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and - 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. - Raising the value (to something like 1.4) will have more write costs, because you will - compact larger StoreFiles. However, during reads, HBase will need to seek through fewer - StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of - Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the - background cost of writes, and use Bloom filters to control the number of StoreFiles touched + 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. + Raising the value (to something like 1.4) will have more write costs, because you will + compact larger StoreFiles. However, during reads, HBase will need to seek through fewer + StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of + Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the + background cost of writes, and use Bloom filters to control the number of StoreFiles touched during reads. For most cases, the default value is appropriate. + .Default `1.2F` - + [[hbase.hstore.compaction.ratio.offpeak]] *`hbase.hstore.compaction.ratio.offpeak`*:: + .Description Allows you to set a different (by default, more aggressive) ratio for determining - whether larger StoreFiles are included in compactions during off-peak hours. Works in the - same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and + whether larger StoreFiles are included in compactions during off-peak hours. Works in the + same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and hbase.offpeak.end.hour are also enabled. + .Default `5.0F` - + [[hbase.hstore.time.to.purge.deletes]] *`hbase.hstore.time.to.purge.deletes`*:: + .Description -The amount of time to delay purging of delete markers with future timestamps. If - unset, or set to 0, all delete markers, including those with future timestamps, are purged - during the next major compaction. Otherwise, a delete marker is kept until the major compaction +The amount of time to delay purging of delete markers with future timestamps. If + unset, or set to 0, all delete markers, including those with future timestamps, are purged + during the next major compaction. Otherwise, a delete marker is kept until the major compaction which occurs after the marker's timestamp plus the value of this setting, in milliseconds. - + + .Default `0` - + [[hbase.offpeak.start.hour]] *`hbase.offpeak.start.hour`*:: + @@ -1143,7 +1144,7 @@ The start of off-peak hours, expressed as an integer between 0 and 23, inclusive .Default `-1` - + [[hbase.offpeak.end.hour]] *`hbase.offpeak.end.hour`*:: + @@ -1154,7 +1155,7 @@ The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. .Default `-1` - + [[hbase.regionserver.thread.compaction.throttle]] *`hbase.regionserver.thread.compaction.throttle`*:: + @@ -1170,19 +1171,19 @@ There are two different thread pools for compactions, one for large compactions .Default `2684354560` - + [[hbase.hstore.compaction.kv.max]] *`hbase.hstore.compaction.kv.max`*:: + .Description The maximum number of KeyValues to read and then write in a batch when flushing or compacting. Set this lower if you have big KeyValues and problems with Out Of Memory - Exceptions Set this higher if you have wide, small rows. + Exceptions Set this higher if you have wide, small rows. + .Default `10` - + [[hbase.storescanner.parallel.seek.enable]] *`hbase.storescanner.parallel.seek.enable`*:: + @@ -1194,7 +1195,7 @@ The maximum number of KeyValues to read and then write in a batch when flushing .Default `false` - + [[hbase.storescanner.parallel.seek.threads]] *`hbase.storescanner.parallel.seek.threads`*:: + @@ -1205,7 +1206,7 @@ The maximum number of KeyValues to read and then write in a batch when flushing .Default `10` - + [[hfile.block.cache.size]] *`hfile.block.cache.size`*:: + @@ -1218,7 +1219,7 @@ Percentage of maximum heap (-Xmx setting) to allocate to block cache .Default `0.4` - + [[hfile.block.index.cacheonwrite]] *`hfile.block.index.cacheonwrite`*:: + @@ -1229,7 +1230,7 @@ This allows to put non-root multi-level index blocks into the block .Default `false` - + [[hfile.index.block.max.size]] *`hfile.index.block.max.size`*:: + @@ -1241,31 +1242,33 @@ When the size of a leaf-level, intermediate-level, or root-level .Default `131072` - + [[hbase.bucketcache.ioengine]] *`hbase.bucketcache.ioengine`*:: + .Description -Where to store the contents of the bucketcache. One of: onheap, - offheap, or file. If a file, set it to file:PATH_TO_FILE. See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html for more information. - +Where to store the contents of the bucketcache. One of: onheap, + offheap, or file. If a file, set it to file:PATH_TO_FILE. + See https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html + for more information. + + .Default `` - + [[hbase.bucketcache.combinedcache.enabled]] *`hbase.bucketcache.combinedcache.enabled`*:: + .Description -Whether or not the bucketcache is used in league with the LRU - on-heap block cache. In this mode, indices and blooms are kept in the LRU +Whether or not the bucketcache is used in league with the LRU + on-heap block cache. In this mode, indices and blooms are kept in the LRU blockcache and the data blocks are kept in the bucketcache. + .Default `true` - + [[hbase.bucketcache.size]] *`hbase.bucketcache.size`*:: + @@ -1276,19 +1279,19 @@ Used along with bucket cache, this is a float that EITHER represents a percentag .Default `0` when specified as a float - + [[hbase.bucketcache.sizes]] *`hbase.bucketcache.sizes`*:: + .Description -A comma-separated list of sizes for buckets for the bucketcache - if you use multiple sizes. Should be a list of block sizes in order from smallest +A comma-separated list of sizes for buckets for the bucketcache + if you use multiple sizes. Should be a list of block sizes in order from smallest to largest. The sizes you use will depend on your data access patterns. + .Default `` - + [[hfile.format.version]] *`hfile.format.version`*:: + @@ -1296,13 +1299,13 @@ A comma-separated list of sizes for buckets for the bucketcache The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Distributed Log Replay requires that tags are enabled. Also see the configuration - 'hbase.replication.rpc.codec'. - + 'hbase.replication.rpc.codec'. + + .Default `3` - + [[hfile.block.bloom.cacheonwrite]] *`hfile.block.bloom.cacheonwrite`*:: + @@ -1312,7 +1315,7 @@ Enables cache-on-write for inline blocks of a compound Bloom filter. .Default `false` - + [[io.storefile.bloom.block.size]] *`io.storefile.bloom.block.size`*:: + @@ -1325,7 +1328,7 @@ The size in bytes of a single block ("chunk") of a compound Bloom .Default `131072` - + [[hbase.rs.cacheblocksonwrite]] *`hbase.rs.cacheblocksonwrite`*:: + @@ -1336,7 +1339,7 @@ Whether an HFile block should be added to the block cache when the .Default `false` - + [[hbase.rpc.timeout]] *`hbase.rpc.timeout`*:: + @@ -1348,7 +1351,7 @@ This is for the RPC layer to define how long HBase client applications .Default `60000` - + [[hbase.rpc.shortoperation.timeout]] *`hbase.rpc.shortoperation.timeout`*:: + @@ -1361,7 +1364,7 @@ This is another version of "hbase.rpc.timeout". For those RPC operation .Default `10000` - + [[hbase.ipc.client.tcpnodelay]] *`hbase.ipc.client.tcpnodelay`*:: + @@ -1372,7 +1375,7 @@ Set no delay on rpc socket connections. See .Default `true` - + [[hbase.master.keytab.file]] *`hbase.master.keytab.file`*:: + @@ -1383,7 +1386,7 @@ Full path to the kerberos keytab file to use for logging in .Default `` - + [[hbase.master.kerberos.principal]] *`hbase.master.kerberos.principal`*:: + @@ -1397,7 +1400,7 @@ Ex. "hbase/[email protected]". The kerberos principal name .Default `` - + [[hbase.regionserver.keytab.file]] *`hbase.regionserver.keytab.file`*:: + @@ -1408,7 +1411,7 @@ Full path to the kerberos keytab file to use for logging in .Default `` - + [[hbase.regionserver.kerberos.principal]] *`hbase.regionserver.kerberos.principal`*:: + @@ -1423,7 +1426,7 @@ Ex. "hbase/[email protected]". The kerberos principal name .Default `` - + [[hadoop.policy.file]] *`hadoop.policy.file`*:: + @@ -1435,7 +1438,7 @@ The policy configuration file used by RPC servers to make .Default `hbase-policy.xml` - + [[hbase.superuser]] *`hbase.superuser`*:: + @@ -1447,7 +1450,7 @@ List of users or groups (comma-separated), who are allowed .Default `` - + [[hbase.auth.key.update.interval]] *`hbase.auth.key.update.interval`*:: + @@ -1458,7 +1461,7 @@ The update interval for master key for authentication tokens .Default `86400000` - + [[hbase.auth.token.max.lifetime]] *`hbase.auth.token.max.lifetime`*:: + @@ -1469,7 +1472,7 @@ The maximum lifetime in milliseconds after which an .Default `604800000` - + [[hbase.ipc.client.fallback-to-simple-auth-allowed]] *`hbase.ipc.client.fallback-to-simple-auth-allowed`*:: + @@ -1484,7 +1487,7 @@ When a client is configured to attempt a secure connection, but attempts to .Default `false` - + [[hbase.display.keys]] *`hbase.display.keys`*:: + @@ -1496,7 +1499,7 @@ When this is set to true the webUI and such will display all start/end keys .Default `true` - + [[hbase.coprocessor.region.classes]] *`hbase.coprocessor.region.classes`*:: + @@ -1510,7 +1513,7 @@ A comma-separated list of Coprocessors that are loaded by .Default `` - + [[hbase.rest.port]] *`hbase.rest.port`*:: + @@ -1520,7 +1523,7 @@ The port for the HBase REST server. .Default `8080` - + [[hbase.rest.readonly]] *`hbase.rest.readonly`*:: + @@ -1532,7 +1535,7 @@ Defines the mode the REST server will be started in. Possible values are: .Default `false` - + [[hbase.rest.threads.max]] *`hbase.rest.threads.max`*:: + @@ -1547,7 +1550,7 @@ The maximum number of threads of the REST server thread pool. .Default `100` - + [[hbase.rest.threads.min]] *`hbase.rest.threads.min`*:: + @@ -1559,7 +1562,7 @@ The minimum number of threads of the REST server thread pool. .Default `2` - + [[hbase.rest.support.proxyuser]] *`hbase.rest.support.proxyuser`*:: + @@ -1569,7 +1572,7 @@ Enables running the REST server to support proxy-user mode. .Default `false` - + [[hbase.defaults.for.version.skip]] *`hbase.defaults.for.version.skip`*:: + @@ -1578,14 +1581,14 @@ Set to true to skip the 'hbase.defaults.for.version' check. Setting this to true can be useful in contexts other than the other side of a maven generation; i.e. running in an ide. You'll want to set this boolean to true to avoid - seeing the RuntimException complaint: "hbase-default.xml file + seeing the RuntimeException complaint: "hbase-default.xml file seems to be for and old version of HBase (\${hbase.version}), this version is X.X.X-SNAPSHOT" + .Default `false` - + [[hbase.coprocessor.master.classes]] *`hbase.coprocessor.master.classes`*:: + @@ -1600,7 +1603,7 @@ A comma-separated list of .Default `` - + [[hbase.coprocessor.abortonerror]] *`hbase.coprocessor.abortonerror`*:: + @@ -1615,7 +1618,7 @@ Set to true to cause the hosting server (master or regionserver) .Default `true` - + [[hbase.online.schema.update.enable]] *`hbase.online.schema.update.enable`*:: + @@ -1625,7 +1628,7 @@ Set true to enable online schema changes. .Default `true` - + [[hbase.table.lock.enable]] *`hbase.table.lock.enable`*:: + @@ -1637,7 +1640,7 @@ Set to true to enable locking the table in zookeeper for schema change operation .Default `true` - + [[hbase.table.max.rowsize]] *`hbase.table.max.rowsize`*:: + @@ -1646,12 +1649,12 @@ Set to true to enable locking the table in zookeeper for schema change operation Maximum size of single row in bytes (default is 1 Gb) for Get'ting or Scan'ning without in-row scan flag set. If row size exceeds this limit RowTooBigException is thrown to client. - + + .Default `1073741824` - + [[hbase.thrift.minWorkerThreads]] *`hbase.thrift.minWorkerThreads`*:: + @@ -1662,7 +1665,7 @@ The "core size" of the thread pool. New threads are created on every .Default `16` - + [[hbase.thrift.maxWorkerThreads]] *`hbase.thrift.maxWorkerThreads`*:: + @@ -1674,7 +1677,7 @@ The maximum size of the thread pool. When the pending request queue .Default `1000` - + [[hbase.thrift.maxQueuedRequests]] *`hbase.thrift.maxQueuedRequests`*:: + @@ -1687,7 +1690,7 @@ The maximum number of pending Thrift connections waiting in the queue. If .Default `1000` - + [[hbase.thrift.htablepool.size.max]] *`hbase.thrift.htablepool.size.max`*:: + @@ -1696,12 +1699,12 @@ The upper bound for the table pool used in the Thrift gateways server. Since this is per table name, we assume a single table and so with 1000 default worker threads max this is set to a matching number. For other workloads this number can be adjusted as needed. - + + .Default `1000` - + [[hbase.regionserver.thrift.framed]] *`hbase.regionserver.thrift.framed`*:: + @@ -1710,12 +1713,12 @@ Use Thrift TFramedTransport on the server side. This is the recommended transport for thrift servers and requires a similar setting on the client side. Changing this to false will select the default transport, vulnerable to DoS when malformed requests are issued due to THRIFT-601. - + + .Default `false` - + [[hbase.regionserver.thrift.framed.max_frame_size_in_mb]] *`hbase.regionserver.thrift.framed.max_frame_size_in_mb`*:: + @@ -1725,7 +1728,7 @@ Default frame size when using framed transport .Default `2` - + [[hbase.regionserver.thrift.compact]] *`hbase.regionserver.thrift.compact`*:: + @@ -1735,7 +1738,7 @@ Use Thrift TCompactProtocol binary serialization protocol. .Default `false` - + [[hbase.data.umask.enable]] *`hbase.data.umask.enable`*:: + @@ -1746,7 +1749,7 @@ Enable, if true, that file permissions should be assigned .Default `false` - + [[hbase.data.umask]] *`hbase.data.umask`*:: + @@ -1757,7 +1760,7 @@ File permissions that should be used to write data .Default `000` - + [[hbase.metrics.showTableName]] *`hbase.metrics.showTableName`*:: + @@ -1770,7 +1773,7 @@ Whether to include the prefix "tbl.tablename" in per-column family metrics. .Default `true` - + [[hbase.metrics.exposeOperationTimes]] *`hbase.metrics.exposeOperationTimes`*:: + @@ -1782,7 +1785,7 @@ Whether to report metrics about time taken performing an .Default `true` - + [[hbase.snapshot.enabled]] *`hbase.snapshot.enabled`*:: + @@ -1792,7 +1795,7 @@ Set to true to allow snapshots to be taken / restored / cloned. .Default `true` - + [[hbase.snapshot.restore.take.failsafe.snapshot]] *`hbase.snapshot.restore.take.failsafe.snapshot`*:: + @@ -1804,7 +1807,7 @@ Set to true to take a snapshot before the restore operation. .Default `true` - + [[hbase.snapshot.restore.failsafe.name]] *`hbase.snapshot.restore.failsafe.name`*:: + @@ -1816,7 +1819,7 @@ Name of the failsafe snapshot taken by the restore operation. .Default `hbase-failsafe-{snapshot.name}-{restore.timestamp}` - + [[hbase.server.compactchecker.interval.multiplier]] *`hbase.server.compactchecker.interval.multiplier`*:: + @@ -1831,7 +1834,7 @@ The number that determines how often we scan to see if compaction is necessary. .Default `1000` - + [[hbase.lease.recovery.timeout]] *`hbase.lease.recovery.timeout`*:: + @@ -1841,7 +1844,7 @@ How long we wait on dfs lease recovery in total before giving up. .Default `900000` - + [[hbase.lease.recovery.dfs.timeout]] *`hbase.lease.recovery.dfs.timeout`*:: + @@ -1855,7 +1858,7 @@ How long between dfs recover lease invocations. Should be larger than the sum of .Default `64000` - + [[hbase.column.max.version]] *`hbase.column.max.version`*:: + @@ -1866,7 +1869,7 @@ New column family descriptors will use this value as the default number of versi .Default `1` - + [[hbase.dfs.client.read.shortcircuit.buffer.size]] *`hbase.dfs.client.read.shortcircuit.buffer.size`*:: + @@ -1880,12 +1883,12 @@ If the DFSClient configuration direct memory. So, we set it down from the default. Make it > the default hbase block size set in the HColumnDescriptor which is usually 64k. - + + .Default `131072` - + [[hbase.regionserver.checksum.verify]] *`hbase.regionserver.checksum.verify`*:: + @@ -1900,13 +1903,13 @@ If the DFSClient configuration fails, we will switch back to using HDFS checksums (so do not disable HDFS checksums! And besides this feature applies to hfiles only, not to WALs). If this parameter is set to false, then hbase will not verify any checksums, - instead it will depend on checksum verification being done in the HDFS client. - + instead it will depend on checksum verification being done in the HDFS client. + + .Default `true` - + [[hbase.hstore.bytes.per.checksum]] *`hbase.hstore.bytes.per.checksum`*:: + @@ -1914,12 +1917,12 @@ If the DFSClient configuration Number of bytes in a newly created checksum chunk for HBase-level checksums in hfile blocks. - + + .Default `16384` - + [[hbase.hstore.checksum.algorithm]] *`hbase.hstore.checksum.algorithm`*:: + @@ -1927,12 +1930,12 @@ If the DFSClient configuration Name of an algorithm that is used to compute checksums. Possible values are NULL, CRC32, CRC32C. - + + .Default `CRC32` - + [[hbase.status.published]] *`hbase.status.published`*:: + @@ -1942,60 +1945,60 @@ If the DFSClient configuration When a region server dies and its recovery starts, the master will push this information to the client application, to let them cut the connection immediately instead of waiting for a timeout. - + + .Default `false` - + [[hbase.status.publisher.class]] *`hbase.status.publisher.class`*:: + .Description Implementation of the status publication with a multicast message. - + + .Default `org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher` - + [[hbase.status.listener.class]] *`hbase.status.listener.class`*:: + .Description Implementation of the status listener with a multicast message. - + + .Default `org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener` - + [[hbase.status.multicast.address.ip]] *`hbase.status.multicast.address.ip`*:: + .Description Multicast address to use for the status publication by multicast. - + + .Default `226.1.1.3` - + [[hbase.status.multicast.address.port]] *`hbase.status.multicast.address.port`*:: + .Description Multicast port to use for the status publication by multicast. - + + .Default `16100` - + [[hbase.dynamic.jars.dir]] *`hbase.dynamic.jars.dir`*:: + @@ -2005,12 +2008,12 @@ If the DFSClient configuration dynamically by the region server without the need to restart. However, an already loaded filter/co-processor class would not be un-loaded. See HBASE-1936 for more details. - + + .Default `${hbase.rootdir}/lib` - + [[hbase.security.authentication]] *`hbase.security.authentication`*:: + @@ -2018,24 +2021,24 @@ If the DFSClient configuration Controls whether or not secure authentication is enabled for HBase. Possible values are 'simple' (no authentication), and 'kerberos'. - + + .Default `simple` - + [[hbase.rest.filter.classes]] *`hbase.rest.filter.classes`*:: + .Description Servlet filters for REST service. - + + .Default `org.apache.hadoop.hbase.rest.filter.GzipFilter` - + [[hbase.master.loadbalancer.class]] *`hbase.master.loadbalancer.class`*:: + @@ -2046,12 +2049,12 @@ If the DFSClient configuration http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html It replaces the DefaultLoadBalancer as the default (since renamed as the SimpleLoadBalancer). - + + .Default `org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer` - + [[hbase.security.exec.permission.checks]] *`hbase.security.exec.permission.checks`*:: + @@ -2067,28 +2070,28 @@ If the DFSClient configuration section of the HBase online manual. For more information on granting or revoking permissions using the AccessController, see the security section of the HBase online manual. - + + .Default `false` - + [[hbase.procedure.regionserver.classes]] *`hbase.procedure.regionserver.classes`*:: + .Description -A comma-separated list of - org.apache.hadoop.hbase.procedure.RegionServerProcedureManager procedure managers that are - loaded by default on the active HRegionServer process. The lifecycle methods (init/start/stop) - will be called by the active HRegionServer process to perform the specific globally barriered - procedure. After implementing your own RegionServerProcedureManager, just put it in +A comma-separated list of + org.apache.hadoop.hbase.procedure.RegionServerProcedureManager procedure managers that are + loaded by default on the active HRegionServer process. The lifecycle methods (init/start/stop) + will be called by the active HRegionServer process to perform the specific globally barriered + procedure. After implementing your own RegionServerProcedureManager, just put it in HBase's classpath and add the fully qualified class name here. - + + .Default `` - + [[hbase.procedure.master.classes]] *`hbase.procedure.master.classes`*:: + @@ -2103,7 +2106,7 @@ A comma-separated list of .Default `` - + [[hbase.coordinated.state.manager.class]] *`hbase.coordinated.state.manager.class`*:: + @@ -2113,7 +2116,7 @@ Fully qualified name of class implementing coordinated state manager. .Default `org.apache.hadoop.hbase.coordination.ZkCoordinatedStateManager` - + [[hbase.regionserver.storefile.refresh.period]] *`hbase.regionserver.storefile.refresh.period`*:: + @@ -2126,12 +2129,12 @@ Fully qualified name of class implementing coordinated state manager. extra Namenode pressure. If the files cannot be refreshed for longer than HFile TTL (hbase.master.hfilecleaner.ttl) the requests are rejected. Configuring HFile TTL to a larger value is also recommended with this setting. - + + .Default `0` - + [[hbase.region.replica.replication.enabled]] *`hbase.region.replica.replication.enabled`*:: + @@ -2139,36 +2142,36 @@ Fully qualified name of class implementing coordinated state manager. Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created - which will tail the logs and replicate the mutatations to region replicas for tables that + which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. - Replication to secondary region replicas works over standard inter-cluster replication. - So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication" + Replication to secondary region replicas works over standard inter-cluster replication. + So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication" to true for this feature to work. - + + .Default `false` - + [[hbase.http.filter.initializers]] *`hbase.http.filter.initializers`*:: + .Description - A comma separated list of class names. Each class in the list must extend - org.apache.hadoop.hbase.http.FilterInitializer. The corresponding Filter will - be initialized. Then, the Filter will be applied to all user facing jsp - and servlet web pages. + A comma separated list of class names. Each class in the list must extend + org.apache.hadoop.hbase.http.FilterInitializer. The corresponding Filter will + be initialized. Then, the Filter will be applied to all user facing jsp + and servlet web pages. The ordering of the list defines the ordering of the filters. - The default StaticUserWebFilter add a user principal as defined by the + The default StaticUserWebFilter add a user principal as defined by the hbase.http.staticuser.user property. - + + .Default `org.apache.hadoop.hbase.http.lib.StaticUserWebFilter` - + [[hbase.security.visibility.mutations.checkauths]] *`hbase.security.visibility.mutations.checkauths`*:: + @@ -2176,41 +2179,41 @@ Fully qualified name of class implementing coordinated state manager. This property if enabled, will check whether the labels in the visibility expression are associated with the user issuing the mutation - + + .Default `false` - + [[hbase.http.max.threads]] *`hbase.http.max.threads`*:: + .Description - The maximum number of threads that the HTTP Server will create in its + The maximum number of threads that the HTTP Server will create in its ThreadPool. - + + .Default `10` - + [[hbase.replication.rpc.codec]] *`hbase.replication.rpc.codec`*:: + .Description The codec that is to be used when replication is enabled so that - the tags are also replicated. This is used along with HFileV3 which + the tags are also replicated. This is used along with HFileV3 which supports tags in them. If tags are not used or if the hfile version used is HFileV2 then KeyValueCodec can be used as the replication codec. Note that using KeyValueCodecWithTags for replication when there are no tags causes no harm. - + + .Default `org.apache.hadoop.hbase.codec.KeyValueCodecWithTags` - + [[hbase.http.staticuser.user]] *`hbase.http.staticuser.user`*:: + @@ -2219,12 +2222,12 @@ Fully qualified name of class implementing coordinated state manager. The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI (user to be used for browsing files). - + + .Default `dr.stack` - + [[hbase.regionserver.handler.abort.on.error.percent]] *`hbase.regionserver.handler.abort.on.error.percent`*:: +
http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/hbase_history.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase_history.adoc b/src/main/asciidoc/_chapters/hbase_history.adoc index de4aff5..7308b90 100644 --- a/src/main/asciidoc/_chapters/hbase_history.adoc +++ b/src/main/asciidoc/_chapters/hbase_history.adoc @@ -29,9 +29,9 @@ :icons: font :experimental: -* 2006: link:http://research.google.com/archive/bigtable.html[BigTable] paper published by Google. -* 2006 (end of year): HBase development starts. -* 2008: HBase becomes Hadoop sub-project. -* 2010: HBase becomes Apache top-level project. +* 2006: link:http://research.google.com/archive/bigtable.html[BigTable] paper published by Google. +* 2006 (end of year): HBase development starts. +* 2008: HBase becomes Hadoop sub-project. +* 2010: HBase becomes Apache top-level project. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/hbck_in_depth.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbck_in_depth.adoc b/src/main/asciidoc/_chapters/hbck_in_depth.adoc index 1b30c59..1e1f9fb 100644 --- a/src/main/asciidoc/_chapters/hbck_in_depth.adoc +++ b/src/main/asciidoc/_chapters/hbck_in_depth.adoc @@ -29,7 +29,7 @@ :experimental: HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. -It works in two basic modes -- a read-only inconsistency identifying mode and a multi-phase read-write repair mode. +It works in two basic modes -- a read-only inconsistency identifying mode and a multi-phase read-write repair mode. === Running hbck to identify inconsistencies @@ -42,10 +42,10 @@ $ ./bin/hbase hbck ---- At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES present. -You may also want to run run hbck a few times because some inconsistencies can be transient (e.g. +You may also want to run hbck a few times because some inconsistencies can be transient (e.g. cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g. via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected. -The using the `-details` option will report more details including a representative listing of all the splits present in all the tables. +The using the `-details` option will report more details including a representative listing of all the splits present in all the tables. [source,bourne] ---- @@ -66,9 +66,9 @@ $ ./bin/hbase hbck TableFoo TableBar === Inconsistencies If after several runs, inconsistencies continue to be reported, you may have encountered a corruption. -These should be rare, but in the event they occur newer versions of HBase include the hbck tool enabled with automatic repair options. +These should be rare, but in the event they occur newer versions of HBase include the hbck tool enabled with automatic repair options. -There are two invariants that when violated create inconsistencies in HBase: +There are two invariants that when violated create inconsistencies in HBase: * HBase's region consistency invariant is satisfied if every region is assigned and deployed on exactly one region server, and all places where this state kept is in accordance. * HBase's table integrity invariant is satisfied if for each table, every possible row key resolves to exactly one region. @@ -77,20 +77,20 @@ Repairs generally work in three phases -- a read-only information gathering phas Starting from version 0.90.0, hbck could detect region consistency problems report on a subset of possible table integrity problems. It also included the ability to automatically fix the most common inconsistency, region assignment and deployment consistency problems. This repair could be done by using the `-fix` command line option. -These problems close regions if they are open on the wrong server or on multiple region servers and also assigns regions to region servers if they are not open. +These problems close regions if they are open on the wrong server or on multiple region servers and also assigns regions to region servers if they are not open. Starting from HBase versions 0.90.7, 0.92.2 and 0.94.0, several new command line options are introduced to aid repairing a corrupted HBase. -This hbck sometimes goes by the nickname ``uberhbck''. Each particular version of uber hbck is compatible with the HBase's of the same major version (0.90.7 uberhbck can repair a 0.90.4). However, versions <=0.90.6 and versions <=0.92.1 may require restarting the master or failing over to a backup master. +This hbck sometimes goes by the nickname ``uberhbck''. Each particular version of uber hbck is compatible with the HBase's of the same major version (0.90.7 uberhbck can repair a 0.90.4). However, versions <=0.90.6 and versions <=0.92.1 may require restarting the master or failing over to a backup master. === Localized repairs When repairing a corrupted HBase, it is best to repair the lowest risk inconsistencies first. These are generally region consistency repairs -- localized single region repairs, that only modify in-memory data, ephemeral zookeeper data, or patch holes in the META table. Region consistency requires that the HBase instance has the state of the region's data in HDFS (.regioninfo files), the region's row in the hbase:meta table., and region's deployment/assignments on region servers and the master in accordance. -Options for repairing region consistency include: +Options for repairing region consistency include: * `-fixAssignments` (equivalent to the 0.90 `-fix` option) repairs unassigned, incorrectly assigned or multiply assigned regions. -* `-fixMeta` which removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if they regions are present in HDFS while not in META. To fix deployment and assignment problems you can run this command: +* `-fixMeta` which removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if they regions are present in HDFS while not in META. To fix deployment and assignment problems you can run this command: [source,bourne] ---- @@ -177,7 +177,7 @@ $ ./bin/hbase hbck -fixMetaOnly -fixAssignments ==== Special cases: HBase version file is missing HBase's data on the file system requires a version file in order to start. -If this flie is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file. +If this file is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file. This assumes that the version of hbck you are running is the appropriate version for the HBase cluster. ==== Special case: Root and META are corrupt. @@ -205,8 +205,8 @@ However, there could be some lingering offline split parents sometimes. They are in META, in HDFS, and not deployed. But HBase can't clean them up. In this case, you can use the `-fixSplitParents` option to reset them in META to be online and not split. -Therefore, hbck can merge them with other regions if fixing overlapping regions option is used. +Therefore, hbck can merge them with other regions if fixing overlapping regions option is used. -This option should not normally be used, and it is not in `-fixAll`. +This option should not normally be used, and it is not in `-fixAll`. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/mapreduce.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/mapreduce.adoc b/src/main/asciidoc/_chapters/mapreduce.adoc index 1337c79..75718fd 100644 --- a/src/main/asciidoc/_chapters/mapreduce.adoc +++ b/src/main/asciidoc/_chapters/mapreduce.adoc @@ -65,7 +65,7 @@ The dependencies only need to be available on the local `CLASSPATH`. The following example runs the bundled HBase link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`. If you have not set the environment variables expected in the command (the parts prefixed by a `$` sign and surrounded by curly braces), you can use the actual system paths instead. Be sure to use the correct version of the HBase JAR for your system. -The backticks (``` symbols) cause ths shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`. +The backticks (``` symbols) cause the shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`. This example assumes you use a BASH-compatible shell. [source,bash] @@ -279,7 +279,7 @@ That is where the logic for map-task assignment resides. The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. -There job would be defined as follows... +The job would be defined as follows... [source,java] ---- @@ -592,7 +592,7 @@ public class MyMapper extends TableMapper<Text, LongWritable> { == Speculative Execution It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source. -This can either be done on a per-Job basis through properties, on on the entire cluster. +This can either be done on a per-Job basis through properties, or on the entire cluster. Especially for longer running jobs, speculative execution will create duplicate map-tasks which will double-write your data to HBase; this is probably not what you want. See <<spec.ex,spec.ex>> for more information. @@ -613,7 +613,7 @@ The following example shows a Cascading `Flow` which "sinks" data into an HBase // emits two fields: "offset" and "line" Tap source = new Hfs( new TextLine(), inputFileLhs ); -// store data in a HBase cluster +// store data in an HBase cluster // accepts fields "num", "lower", and "upper" // will automatically scope incoming fields to their proper familyname, "left" or "right" Fields keyFields = new Fields( "num" ); http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/ops_mgt.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index c5f52f5..13835c0 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -79,7 +79,7 @@ There is a Canary class can help users to canary-test the HBase cluster status, To see the usage, use the `--help` parameter. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help +$ ${HBASE_HOME}/bin/hbase canary -help Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..] where [opts] are: @@ -126,7 +126,7 @@ Following are some examples based on the previous given case. ==== Canary test for every column family (store) of every region of every table ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary +$ ${HBASE_HOME}/bin/hbase canary 3/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf1 in 2ms 13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf2 in 2ms @@ -147,7 +147,7 @@ This is a default behavior of the this tool does. You can also test one or more specific tables. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary test-01 test-02 +$ ${HBASE_HOME}/bin/hbase canary test-01 test-02 ---- ==== Canary test with RegionServer granularity @@ -155,7 +155,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary test-01 test-02 This will pick one small piece of data from each RegionServer, and can also put your RegionServer name as input options for canary-test specific RegionServer. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver +$ ${HBASE_HOME}/bin/hbase canary -regionserver 13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs2 in 72ms 13/12/09 06:05:17 INFO tool.Canary: Read from table:test-02 on region server:rs3 in 34ms @@ -167,7 +167,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver This will test both table test-01 and test-02. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -e test-0[1-2] +$ ${HBASE_HOME}/bin/hbase canary -e test-0[1-2] ---- ==== Run canary test as daemon mode @@ -176,13 +176,13 @@ Run repeatedly with interval defined in option `-interval` whose default value i This daemon will stop itself and return non-zero error code if any error occurs, due to the default value of option -f is true. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -daemon +$ ${HBASE_HOME}/bin/hbase canary -daemon ---- Run repeatedly with internal 5 seconds and will not stop itself even if errors occur in the test. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -daemon -interval 50000 -f false +$ ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false ---- ==== Force timeout if canary test stuck @@ -192,23 +192,23 @@ Because of this we provide a timeout option to kill the canary test and return a This run sets the timeout value to 60 seconds, the default value is 600 seconds. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -t 600000 +$ ${HBASE_HOME}/bin/hbase canary -t 600000 ---- ==== Enable write sniffing in canary By default, the canary tool only check the read operations, it's hard to find the problem in the write path. To enable the write sniffing, you can run canary with the `-writeSniffing` option. -When the write sniffing is enabled, the canary tool will create a hbase table and make sure the +When the write sniffing is enabled, the canary tool will create an hbase table and make sure the regions of the table distributed on all region servers. In each sniffing period, the canary will try to put data to these regions to check the write availability of each region server. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -writeSniffing +$ ${HBASE_HOME}/bin/hbase canary -writeSniffing ---- The default write table is `hbase:canary` and can be specified by the option `-writeTable`. ---- -$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -writeSniffing -writeTable ns:canary +$ ${HBASE_HOME}/bin/hbase canary -writeSniffing -writeTable ns:canary ---- The default value size of each put is 10 bytes and you can set it by the config key: @@ -351,7 +351,7 @@ You can invoke it via the HBase cli with the 'wal' command. [NOTE] ==== Prior to version 2.0, the WAL Pretty Printer was called the `HLogPrettyPrinter`, after an internal name for HBase's write ahead log. -In those versions, you can pring the contents of a WAL using the same configuration as above, but with the 'hlog' command. +In those versions, you can print the contents of a WAL using the same configuration as above, but with the 'hlog' command. ---- $ ./bin/hbase hlog hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012 @@ -523,7 +523,7 @@ row9 c1 c2 row10 c1 c2 ---- -For ImportTsv to use this imput file, the command line needs to look like this: +For ImportTsv to use this input file, the command line needs to look like this: ---- @@ -781,7 +781,7 @@ To decommission a loaded RegionServer, run the following: +$ ==== The `HOSTNAME` passed to _graceful_stop.sh_ must match the hostname that hbase is using to identify RegionServers. Check the list of RegionServers in the master UI for how HBase is referring to servers. -Its usually hostname but can also be FQDN. +It's usually hostname but can also be FQDN. Whatever HBase is using, this is what you should pass the _graceful_stop.sh_ decommission script. If you pass IPs, the script is not yet smart enough to make a hostname (or FQDN) of it and so it will fail when it checks if server is currently running; the graceful unloading of regions will not run. ==== @@ -821,12 +821,12 @@ Hence, it is better to manage the balancer apart from `graceful_stop` reenabling [[draining.servers]] ==== Decommissioning several Regions Servers concurrently -If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping mutiple RegionServers concurrently. +If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping multiple RegionServers concurrently. To gracefully drain multiple regionservers at the same time, RegionServers can be put into a "draining" state. This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper under the _hbase_root/draining_ znode. This znode has format `name,port,startcode` just like the regionserver entries under _hbase_root/rs_ znode. -Without this facility, decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. +Without this facility, decommissioning multiple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining. Marking RegionServers to be in the draining state prevents this from happening. See this link:http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html[blog post] for more details. @@ -991,7 +991,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h Restart the region server for the changes to take effect. To change the sampling rate for the default sink, edit the line beginning with `*.period`. -To filter which metrics are emitted or to extend the metrics framework, see link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html +To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html .HBase Metrics and Ganglia [NOTE] @@ -1014,15 +1014,15 @@ Rather than listing each metric which HBase emits by default, you can browse thr Different metrics are exposed for the Master process and each region server process. .Procedure: Access a JSON Output of Available Metrics -. After starting HBase, access the region server's web UI, at `http://REGIONSERVER_HOSTNAME:60030` by default (or port 16030 in HBase 1.0+). +. After starting HBase, access the region server's web UI, at pass:[http://REGIONSERVER_HOSTNAME:60030] by default (or port 16030 in HBase 1.0+). . Click the [label]#Metrics Dump# link near the top. The metrics for the region server are presented as a dump of the JMX bean in JSON format. This will dump out all metrics names and their values. - To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60030/jmx?description=true`. + To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60030/jmx?description=true]. Not all beans and attributes have descriptions. -. To view metrics for the Master, connect to the Master's web UI instead (defaults to `http://localhost:60010` or port 16010 in HBase 1.0+) and click its [label]#Metrics +. To view metrics for the Master, connect to the Master's web UI instead (defaults to pass:[http://localhost:60010] or port 16010 in HBase 1.0+) and click its [label]#Metrics Dump# link. - To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60010/jmx?description=true`. + To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60010/jmx?description=true]. Not all beans and attributes have descriptions. @@ -1341,9 +1341,9 @@ disable_peer <ID>:: remove_peer <ID>:: Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs. enable_table_replication <TABLE_NAME>:: - Enable the table replication switch for all it's column families. If the table is not found in the destination cluster then it will create one with the same name and column families. + Enable the table replication switch for all its column families. If the table is not found in the destination cluster then it will create one with the same name and column families. disable_table_replication <TABLE_NAME>:: - Disable the table replication switch for all it's column families. + Disable the table replication switch for all its column families. === Verifying Replicated Data @@ -1462,7 +1462,7 @@ Speed is also limited by total size of the list of edits to replicate per slave, With this configuration, a master cluster region server with three slaves would use at most 192 MB to store data to replicate. This does not account for the data which was filtered but not garbage collected. -Once the maximum size of edits has been buffered or the reader reaces the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return. +Once the maximum size of edits has been buffered or the reader reaches the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return. If the RPC was successful, the source determines whether the current file has been emptied or it contains more data which needs to be read. If the file has been emptied, the source deletes the znode in the queue. Otherwise, it registers the new offset in the log's znode. @@ -1778,7 +1778,7 @@ but still suboptimal compared to a mechanism which allows large requests to be s into multiple smaller ones. HBASE-10993 introduces such a system for deprioritizing long-running scanners. There -are two types of queues,`fifo` and `deadline`.To configure the type of queue used, +are two types of queues, `fifo` and `deadline`. To configure the type of queue used, configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There is no way to estimate how long each request may take, so de-prioritization only affects scans, and is based on the number of ânextâ calls a scan request has made. An assumption @@ -2049,7 +2049,7 @@ Aside from the disk space necessary to store the data, one RS may not be able to [[ops.capacity.nodes.throughput]] ==== Read/Write throughput -Number of nodes can also be driven by required thoughput for reads and/or writes. +Number of nodes can also be driven by required throughput for reads and/or writes. The throughput one can get per node depends a lot on data (esp. key/value sizes) and request patterns, as well as node and system configuration. Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count. @@ -2214,7 +2214,7 @@ or in code it would be as follows: [source,java] ---- -void rename(Admin admin, String oldTableName, String newTableName) { +void rename(Admin admin, String oldTableName, TableName newTableName) { String snapshotName = randomName(); admin.disableTable(oldTableName); admin.snapshot(snapshotName, oldTableName); http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/other_info.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/other_info.adoc b/src/main/asciidoc/_chapters/other_info.adoc index 046b747..6143876 100644 --- a/src/main/asciidoc/_chapters/other_info.adoc +++ b/src/main/asciidoc/_chapters/other_info.adoc @@ -31,50 +31,50 @@ [[other.info.videos]] === HBase Videos -.Introduction to HBase -* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). -* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010). -link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011). +.Introduction to HBase +* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). +* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010). +link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011). -link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010). +link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010). [[other.info.pres]] === HBase Presentations (Slides) -link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011). +link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011). -link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). +link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). -link:http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install[Getting The Most From Your HBase Install] by Ryan Rawson, Jonathan Gray (Hadoop World 2009). +link:http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install[Getting The Most From Your HBase Install] by Ryan Rawson, Jonathan Gray (Hadoop World 2009). [[other.info.papers]] === HBase Papers -link:http://research.google.com/archive/bigtable.html[BigTable] by Google (2006). +link:http://research.google.com/archive/bigtable.html[BigTable] by Google (2006). -link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS Locality] by Lars George (2010). +link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS Locality] by Lars George (2010). -link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases] by Ian Varley (2009). +link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases] by Ian Varley (2009). [[other.info.sites]] === HBase Sites -link:http://www.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information. +link:http://www.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information. -* link:http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems. +* link:http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems. -link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations. +link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations. -link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone. +link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone. [[other.info.books]] === HBase Books -link:http://shop.oreilly.com/product/0636920014348.do[HBase: The Definitive Guide] by Lars George. +link:http://shop.oreilly.com/product/0636920014348.do[HBase: The Definitive Guide] by Lars George. [[other.info.books.hadoop]] === Hadoop Books -link:http://shop.oreilly.com/product/9780596521981.do[Hadoop: The Definitive Guide] by Tom White. +link:http://shop.oreilly.com/product/9780596521981.do[Hadoop: The Definitive Guide] by Tom White. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/c07ddc6d/src/main/asciidoc/_chapters/performance.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc index bf0e790..5155f0a 100644 --- a/src/main/asciidoc/_chapters/performance.adoc +++ b/src/main/asciidoc/_chapters/performance.adoc @@ -88,7 +88,7 @@ Multiple rack configurations carry the same potential issues as multiple switche * Poor switch capacity performance * Insufficient uplink to another rack -If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks. +If the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks. The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. @@ -102,12 +102,12 @@ Are all the network interfaces functioning correctly? Are you sure? See the Trou [[perf.network.call_me_maybe]] === Network Consistency and Partition Tolerance -The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics: -- *C*onsistency -- all nodes see the same data. +The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three characteristics: +- *C*onsistency -- all nodes see the same data. - *A*vailability -- every request receives a response about whether it succeeded or failed. - *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others. -HBase favors consistency and partition tolerance, where a decision has to be made. Coda Hale explains why partition tolerance is so important, in http://codahale.com/you-cant-sacrifice-partition-tolerance/. +HBase favors consistency and partition tolerance, where a decision has to be made. Coda Hale explains why partition tolerance is so important, in http://codahale.com/you-cant-sacrifice-partition-tolerance/. Robert Yokota used an automated testing framework called link:https://aphyr.com/tags/jepsen[Jepson] to test HBase's partition tolerance in the face of network partitions, using techniques modeled after Aphyr's link:https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions[Call Me Maybe] series. The results, available as a link:https://rayokota.wordpress.com/2015/09/30/call-me-maybe-hbase/[blog post] and an link:https://rayokota.wordpress.com/2015/09/30/call-me-maybe-hbase-addendum/[addendum], show that HBase performs correctly. @@ -556,7 +556,7 @@ When writing a lot of data to an HBase table from a MR job (e.g., with link:http When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. -For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case. +For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the above case. [[perf.one.region]] === Anti-Pattern: One Hot Region @@ -565,7 +565,7 @@ If all your data is being written to one region at a time, then re-read the sect Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a variety of reasons that regions may appear "well split" but won't work with your data. -As the HBase client communicates directly with the RegionServers, this can be obtained via link:hhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte[])[Table.getRegionLocation]. +As the HBase client communicates directly with the RegionServers, this can be obtained via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation]. See <<precreate.regions>>, as well as <<perf.configurations>> @@ -607,7 +607,7 @@ When columns are selected explicitly with `scan.addColumn`, HBase will schedule When rows have few columns and each column has only a few versions this can be inefficient. A seek operation is generally slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes. -In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set the on Scan object. +In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set on the Scan object. The following code instructs the RegionServer to attempt two iterations of next before a seek is scheduled: [source,java] @@ -731,7 +731,7 @@ However, if hedged reads are enabled, the client waits some configurable amount Whichever read returns first is used, and the other read request is discarded. Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection. -Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment. +Because an HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment. .Configuration for Hedged Reads * `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads. @@ -782,7 +782,8 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation. For a large number of deletes, consider `Table.delete(List)`. -See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29 +See ++++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29">hbase.client.Delete</a>+++. [[perf.hdfs]] == HDFS @@ -869,7 +870,7 @@ If you are running on EC2 and post performance questions on the dist-list, pleas == Collocating HBase and MapReduce It is often recommended to have different clusters for HBase and MapReduce. -A better qualification of this is: don't collocate a HBase that serves live requests with a heavy MR workload. +A better qualification of this is: don't collocate an HBase that serves live requests with a heavy MR workload. OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other, usually the former. For example, short latency-sensitive disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as possible. MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate blocks in the <<block.cache>>.
