http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/hbase-default.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase-default.adoc b/src/main/asciidoc/_chapters/hbase-default.adoc index 8df9b17..ffc018b 100644 --- a/src/main/asciidoc/_chapters/hbase-default.adoc +++ b/src/main/asciidoc/_chapters/hbase-default.adoc @@ -46,7 +46,7 @@ Temporary directory on the local filesystem. .Default `${java.io.tmpdir}/hbase-${user.name}` - + [[hbase.rootdir]] *`hbase.rootdir`*:: + @@ -64,7 +64,7 @@ The directory shared by region servers and into .Default `${hbase.tmp.dir}/hbase` - + [[hbase.cluster.distributed]] *`hbase.cluster.distributed`*:: + @@ -77,7 +77,7 @@ The mode the cluster will be in. Possible values are .Default `false` - + [[hbase.zookeeper.quorum]] *`hbase.zookeeper.quorum`*:: + @@ -97,7 +97,7 @@ Comma separated list of servers in the ZooKeeper ensemble .Default `localhost` - + [[hbase.local.dir]] *`hbase.local.dir`*:: + @@ -108,7 +108,7 @@ Directory on the local filesystem to be used .Default `${hbase.tmp.dir}/local/` - + [[hbase.master.info.port]] *`hbase.master.info.port`*:: + @@ -119,18 +119,18 @@ The port for the HBase Master web UI. .Default `16010` - + [[hbase.master.info.bindAddress]] *`hbase.master.info.bindAddress`*:: + .Description The bind address for the HBase Master web UI - + + .Default `0.0.0.0` - + [[hbase.master.logcleaner.plugins]] *`hbase.master.logcleaner.plugins`*:: + @@ -145,7 +145,7 @@ A comma-separated list of BaseLogCleanerDelegate invoked by .Default `org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner` - + [[hbase.master.logcleaner.ttl]] *`hbase.master.logcleaner.ttl`*:: + @@ -156,7 +156,7 @@ Maximum time a WAL can stay in the .oldlogdir directory, .Default `600000` - + [[hbase.master.hfilecleaner.plugins]] *`hbase.master.hfilecleaner.plugins`*:: + @@ -172,7 +172,7 @@ A comma-separated list of BaseHFileCleanerDelegate invoked by .Default `org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner` - + [[hbase.master.catalog.timeout]] *`hbase.master.catalog.timeout`*:: + @@ -183,7 +183,7 @@ Timeout value for the Catalog Janitor from the master to .Default `600000` - + [[hbase.master.infoserver.redirect]] *`hbase.master.infoserver.redirect`*:: + @@ -195,7 +195,7 @@ Whether or not the Master listens to the Master web .Default `true` - + [[hbase.regionserver.port]] *`hbase.regionserver.port`*:: + @@ -205,7 +205,7 @@ The port the HBase RegionServer binds to. .Default `16020` - + [[hbase.regionserver.info.port]] *`hbase.regionserver.info.port`*:: + @@ -216,7 +216,7 @@ The port for the HBase RegionServer web UI .Default `16030` - + [[hbase.regionserver.info.bindAddress]] *`hbase.regionserver.info.bindAddress`*:: + @@ -226,7 +226,7 @@ The address for the HBase RegionServer web UI .Default `0.0.0.0` - + [[hbase.regionserver.info.port.auto]] *`hbase.regionserver.info.port.auto`*:: + @@ -239,7 +239,7 @@ Whether or not the Master or RegionServer .Default `false` - + [[hbase.regionserver.handler.count]] *`hbase.regionserver.handler.count`*:: + @@ -250,7 +250,7 @@ Count of RPC Listener instances spun up on RegionServers. .Default `30` - + [[hbase.ipc.server.callqueue.handler.factor]] *`hbase.ipc.server.callqueue.handler.factor`*:: + @@ -262,7 +262,7 @@ Factor to determine the number of call queues. .Default `0.1` - + [[hbase.ipc.server.callqueue.read.ratio]] *`hbase.ipc.server.callqueue.read.ratio`*:: + @@ -287,12 +287,12 @@ Split the call queues into read and write queues. and 2 queues will contain only write requests. a read.ratio of 1 means that: 9 queues will contain only read requests and 1 queues will contain only write requests. - + + .Default `0` - + [[hbase.ipc.server.callqueue.scan.ratio]] *`hbase.ipc.server.callqueue.scan.ratio`*:: + @@ -313,12 +313,12 @@ Given the number of read call queues, calculated from the total number and 4 queues will contain only short-read requests. a scan.ratio of 0.8 means that: 6 queues will contain only long-read requests and 2 queues will contain only short-read requests. - + + .Default `0` - + [[hbase.regionserver.msginterval]] *`hbase.regionserver.msginterval`*:: + @@ -329,7 +329,7 @@ Interval between messages from the RegionServer to Master .Default `3000` - + [[hbase.regionserver.regionSplitLimit]] *`hbase.regionserver.regionSplitLimit`*:: + @@ -342,7 +342,7 @@ Limit for the number of regions after which no more region .Default `2147483647` - + [[hbase.regionserver.logroll.period]] *`hbase.regionserver.logroll.period`*:: + @@ -353,7 +353,7 @@ Period at which we will roll the commit log regardless .Default `3600000` - + [[hbase.regionserver.logroll.errors.tolerated]] *`hbase.regionserver.logroll.errors.tolerated`*:: + @@ -367,7 +367,7 @@ The number of consecutive WAL close errors we will allow .Default `2` - + [[hbase.regionserver.hlog.reader.impl]] *`hbase.regionserver.hlog.reader.impl`*:: + @@ -377,7 +377,7 @@ The WAL file reader implementation. .Default `org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader` - + [[hbase.regionserver.hlog.writer.impl]] *`hbase.regionserver.hlog.writer.impl`*:: + @@ -387,7 +387,7 @@ The WAL file writer implementation. .Default `org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter` - + [[hbase.master.distributed.log.replay]] *`hbase.master.distributed.log.replay`*:: + @@ -397,13 +397,13 @@ Enable 'distributed log replay' as default engine splitting back to the old mode 'distributed log splitter', set the value to 'false'. 'Disributed log replay' improves MTTR because it does not write intermediate files. 'DLR' required that 'hfile.format.version' - be set to version 3 or higher. - + be set to version 3 or higher. + + .Default `true` - + [[hbase.regionserver.global.memstore.size]] *`hbase.regionserver.global.memstore.size`*:: + @@ -416,20 +416,20 @@ Maximum size of all memstores in a region server before new .Default `0.4` - + [[hbase.regionserver.global.memstore.size.lower.limit]] *`hbase.regionserver.global.memstore.size.lower.limit`*:: + .Description Maximum size of all memstores in a region server before flushes are forced. Defaults to 95% of hbase.regionserver.global.memstore.size. - A 100% value for this value causes the minimum possible flushing to occur when updates are + A 100% value for this value causes the minimum possible flushing to occur when updates are blocked due to memstore limiting. + .Default `0.95` - + [[hbase.regionserver.optionalcacheflushinterval]] *`hbase.regionserver.optionalcacheflushinterval`*:: + @@ -441,7 +441,7 @@ Maximum size of all memstores in a region server before flushes are forced. .Default `3600000` - + [[hbase.regionserver.catalog.timeout]] *`hbase.regionserver.catalog.timeout`*:: + @@ -451,7 +451,7 @@ Timeout value for the Catalog Janitor from the regionserver to META. .Default `600000` - + [[hbase.regionserver.dns.interface]] *`hbase.regionserver.dns.interface`*:: + @@ -462,7 +462,7 @@ The name of the Network Interface from which a region server .Default `default` - + [[hbase.regionserver.dns.nameserver]] *`hbase.regionserver.dns.nameserver`*:: + @@ -474,7 +474,7 @@ The host name or IP address of the name server (DNS) .Default `default` - + [[hbase.regionserver.region.split.policy]] *`hbase.regionserver.region.split.policy`*:: + @@ -483,12 +483,12 @@ The host name or IP address of the name server (DNS) A split policy determines when a region should be split. The various other split policies that are available currently are ConstantSizeRegionSplitPolicy, DisabledRegionSplitPolicy, DelimitedKeyPrefixRegionSplitPolicy, KeyPrefixRegionSplitPolicy etc. - + + .Default `org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy` - + [[zookeeper.session.timeout]] *`zookeeper.session.timeout`*:: + @@ -502,12 +502,12 @@ ZooKeeper session timeout in milliseconds. It is used in two different ways. to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and it will take precedence. The current default that ZK ships with is 40 seconds, which is lower than HBase's. - + + .Default `90000` - + [[zookeeper.znode.parent]] *`zookeeper.znode.parent`*:: + @@ -520,7 +520,7 @@ Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper .Default `/hbase` - + [[zookeeper.znode.rootserver]] *`zookeeper.znode.rootserver`*:: + @@ -533,7 +533,7 @@ Path to ZNode holding root region location. This is written by .Default `root-region-server` - + [[zookeeper.znode.acl.parent]] *`zookeeper.znode.acl.parent`*:: + @@ -543,7 +543,7 @@ Root ZNode for access control lists. .Default `acl` - + [[hbase.zookeeper.dns.interface]] *`hbase.zookeeper.dns.interface`*:: + @@ -554,7 +554,7 @@ The name of the Network Interface from which a ZooKeeper server .Default `default` - + [[hbase.zookeeper.dns.nameserver]] *`hbase.zookeeper.dns.nameserver`*:: + @@ -566,7 +566,7 @@ The host name or IP address of the name server (DNS) .Default `default` - + [[hbase.zookeeper.peerport]] *`hbase.zookeeper.peerport`*:: + @@ -578,7 +578,7 @@ Port used by ZooKeeper peers to talk to each other. .Default `2888` - + [[hbase.zookeeper.leaderport]] *`hbase.zookeeper.leaderport`*:: + @@ -590,7 +590,7 @@ Port used by ZooKeeper for leader election. .Default `3888` - + [[hbase.zookeeper.useMulti]] *`hbase.zookeeper.useMulti`*:: + @@ -616,7 +616,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `10` - + [[hbase.zookeeper.property.syncLimit]] *`hbase.zookeeper.property.syncLimit`*:: + @@ -628,7 +628,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `5` - + [[hbase.zookeeper.property.dataDir]] *`hbase.zookeeper.property.dataDir`*:: + @@ -639,7 +639,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `${hbase.tmp.dir}/zookeeper` - + [[hbase.zookeeper.property.clientPort]] *`hbase.zookeeper.property.clientPort`*:: + @@ -650,7 +650,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `2181` - + [[hbase.zookeeper.property.maxClientCnxns]] *`hbase.zookeeper.property.maxClientCnxns`*:: + @@ -664,7 +664,7 @@ Property from ZooKeeper's config zoo.cfg. .Default `300` - + [[hbase.client.write.buffer]] *`hbase.client.write.buffer`*:: + @@ -679,7 +679,7 @@ Default size of the HTable client write buffer in bytes. .Default `2097152` - + [[hbase.client.pause]] *`hbase.client.pause`*:: + @@ -692,7 +692,7 @@ General client pause value. Used mostly as value to wait .Default `100` - + [[hbase.client.retries.number]] *`hbase.client.retries.number`*:: + @@ -707,7 +707,7 @@ Maximum retries. Used as maximum for all retryable .Default `35` - + [[hbase.client.max.total.tasks]] *`hbase.client.max.total.tasks`*:: + @@ -718,7 +718,7 @@ The maximum number of concurrent tasks a single HTable instance will .Default `100` - + [[hbase.client.max.perserver.tasks]] *`hbase.client.max.perserver.tasks`*:: + @@ -729,7 +729,7 @@ The maximum number of concurrent tasks a single HTable instance will .Default `5` - + [[hbase.client.max.perregion.tasks]] *`hbase.client.max.perregion.tasks`*:: + @@ -742,7 +742,7 @@ The maximum number of concurrent connections the client will .Default `1` - + [[hbase.client.scanner.caching]] *`hbase.client.scanner.caching`*:: + @@ -757,7 +757,7 @@ Number of rows that will be fetched when calling next .Default `100` - + [[hbase.client.keyvalue.maxsize]] *`hbase.client.keyvalue.maxsize`*:: + @@ -772,7 +772,7 @@ Specifies the combined maximum allowed size of a KeyValue .Default `10485760` - + [[hbase.client.scanner.timeout.period]] *`hbase.client.scanner.timeout.period`*:: + @@ -782,7 +782,7 @@ Client scanner lease period in milliseconds. .Default `60000` - + [[hbase.client.localityCheck.threadPoolSize]] *`hbase.client.localityCheck.threadPoolSize`*:: + @@ -792,7 +792,7 @@ Client scanner lease period in milliseconds. .Default `2` - + [[hbase.bulkload.retries.number]] *`hbase.bulkload.retries.number`*:: + @@ -804,7 +804,7 @@ Maximum retries. This is maximum number of iterations .Default `10` - + [[hbase.balancer.period ]] *`hbase.balancer.period @@ -816,7 +816,7 @@ Period at which the region balancer runs in the Master. .Default `300000` - + [[hbase.regions.slop]] *`hbase.regions.slop`*:: + @@ -826,7 +826,7 @@ Rebalance if any regionserver has average + (average * slop) regions. .Default `0.2` - + [[hbase.server.thread.wakefrequency]] *`hbase.server.thread.wakefrequency`*:: + @@ -837,7 +837,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `10000` - + [[hbase.server.versionfile.writeattempts]] *`hbase.server.versionfile.writeattempts`*:: + @@ -850,7 +850,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `3` - + [[hbase.hregion.memstore.flush.size]] *`hbase.hregion.memstore.flush.size`*:: + @@ -863,7 +863,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `134217728` - + [[hbase.hregion.percolumnfamilyflush.size.lower.bound]] *`hbase.hregion.percolumnfamilyflush.size.lower.bound`*:: + @@ -876,12 +876,12 @@ Time to sleep in between searches for work (in milliseconds). memstore size more than this, all the memstores will be flushed (just as usual). This value should be less than half of the total memstore threshold (hbase.hregion.memstore.flush.size). - + + .Default `16777216` - + [[hbase.hregion.preclose.flush.size]] *`hbase.hregion.preclose.flush.size`*:: + @@ -900,7 +900,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `5242880` - + [[hbase.hregion.memstore.block.multiplier]] *`hbase.hregion.memstore.block.multiplier`*:: + @@ -916,7 +916,7 @@ Time to sleep in between searches for work (in milliseconds). .Default `4` - + [[hbase.hregion.memstore.mslab.enabled]] *`hbase.hregion.memstore.mslab.enabled`*:: + @@ -930,19 +930,19 @@ Time to sleep in between searches for work (in milliseconds). .Default `true` - + [[hbase.hregion.max.filesize]] *`hbase.hregion.max.filesize`*:: + .Description - Maximum HFile size. If the sum of the sizes of a region's HFiles has grown to exceed this + Maximum HFile size. If the sum of the sizes of a region's HFiles has grown to exceed this value, the region is split in two. + .Default `10737418240` - + [[hbase.hregion.majorcompaction]] *`hbase.hregion.majorcompaction`*:: + @@ -959,7 +959,7 @@ Time between major compactions, expressed in milliseconds. Set to 0 to disable .Default `604800000` - + [[hbase.hregion.majorcompaction.jitter]] *`hbase.hregion.majorcompaction.jitter`*:: + @@ -972,32 +972,32 @@ A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occ .Default `0.50` - + [[hbase.hstore.compactionThreshold]] *`hbase.hstore.compactionThreshold`*:: + .Description - If more than this number of StoreFiles exist in any one Store - (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all + If more than this number of StoreFiles exist in any one Store + (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay compaction, but when compaction does occur, it takes longer to complete. + .Default `3` - + [[hbase.hstore.flusher.count]] *`hbase.hstore.flusher.count`*:: + .Description The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on - HDFS, and potentially causing more compactions. + HDFS, and potentially causing more compactions. + .Default `2` - + [[hbase.hstore.blockingStoreFiles]] *`hbase.hstore.blockingStoreFiles`*:: + @@ -1009,40 +1009,40 @@ A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occ .Default `10` - + [[hbase.hstore.blockingWaitTime]] *`hbase.hstore.blockingWaitTime`*:: + .Description The time for which a region will block updates after reaching the StoreFile limit - defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop + defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop blocking updates even if a compaction has not been completed. + .Default `90000` - + [[hbase.hstore.compaction.min]] *`hbase.hstore.compaction.min`*:: + .Description -The minimum number of StoreFiles which must be eligible for compaction before - compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid ending up with - too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction +The minimum number of StoreFiles which must be eligible for compaction before + compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid ending up with + too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you - set this value too high, all the other values will need to be adjusted accordingly. For most + set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In previous versions of HBase, the parameter hbase.hstore.compaction.min was named hbase.hstore.compactionThreshold. + .Default `3` - + [[hbase.hstore.compaction.max]] *`hbase.hstore.compaction.max`*:: + .Description -The maximum number of StoreFiles which will be selected for a single minor +The maximum number of StoreFiles which will be selected for a single minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the length of time it takes a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most @@ -1051,88 +1051,88 @@ The maximum number of StoreFiles which will be selected for a single minor .Default `10` - + [[hbase.hstore.compaction.min.size]] *`hbase.hstore.compaction.min.size`*:: + .Description -A StoreFile smaller than this size will always be eligible for minor compaction. - HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if - they are eligible. Because this limit represents the "automatic include"limit for all - StoreFiles smaller than this value, this value may need to be reduced in write-heavy - environments where many StoreFiles in the 1-2 MB range are being flushed, because every +A StoreFile smaller than this size will always be eligible for minor compaction. + HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if + they are eligible. Because this limit represents the "automatic include"limit for all + StoreFiles smaller than this value, this value may need to be reduced in write-heavy + environments where many StoreFiles in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is - triggered more quickly. This addressed some issues seen in earlier versions of HBase but - changing this parameter is no longer necessary in most situations. Default: 128 MB expressed + triggered more quickly. This addressed some issues seen in earlier versions of HBase but + changing this parameter is no longer necessary in most situations. Default: 128 MB expressed in bytes. + .Default `134217728` - + [[hbase.hstore.compaction.max.size]] *`hbase.hstore.compaction.max.size`*:: + .Description -A StoreFile larger than this size will be excluded from compaction. The effect of - raising hbase.hstore.compaction.max.size is fewer, larger StoreFiles that do not get +A StoreFile larger than this size will be excluded from compaction. The effect of + raising hbase.hstore.compaction.max.size is fewer, larger StoreFiles that do not get compacted often. If you feel that compaction is happening too often without much benefit, you can try raising this value. Default: the value of LONG.MAX_VALUE, expressed in bytes. + .Default `9223372036854775807` - + [[hbase.hstore.compaction.ratio]] *`hbase.hstore.compaction.ratio`*:: + .Description -For minor compaction, this ratio is used to determine whether a given StoreFile +For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than hbase.hstore.compaction.min.size is eligible for compaction. Its effect is to limit compaction of large StoreFiles. The value of hbase.hstore.compaction.ratio - is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single - giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the + is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single + giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and - 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. - Raising the value (to something like 1.4) will have more write costs, because you will - compact larger StoreFiles. However, during reads, HBase will need to seek through fewer - StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of - Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the - background cost of writes, and use Bloom filters to control the number of StoreFiles touched + 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. + Raising the value (to something like 1.4) will have more write costs, because you will + compact larger StoreFiles. However, during reads, HBase will need to seek through fewer + StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of + Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the + background cost of writes, and use Bloom filters to control the number of StoreFiles touched during reads. For most cases, the default value is appropriate. + .Default `1.2F` - + [[hbase.hstore.compaction.ratio.offpeak]] *`hbase.hstore.compaction.ratio.offpeak`*:: + .Description Allows you to set a different (by default, more aggressive) ratio for determining - whether larger StoreFiles are included in compactions during off-peak hours. Works in the - same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and + whether larger StoreFiles are included in compactions during off-peak hours. Works in the + same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and hbase.offpeak.end.hour are also enabled. + .Default `5.0F` - + [[hbase.hstore.time.to.purge.deletes]] *`hbase.hstore.time.to.purge.deletes`*:: + .Description -The amount of time to delay purging of delete markers with future timestamps. If - unset, or set to 0, all delete markers, including those with future timestamps, are purged - during the next major compaction. Otherwise, a delete marker is kept until the major compaction +The amount of time to delay purging of delete markers with future timestamps. If + unset, or set to 0, all delete markers, including those with future timestamps, are purged + during the next major compaction. Otherwise, a delete marker is kept until the major compaction which occurs after the marker's timestamp plus the value of this setting, in milliseconds. - + + .Default `0` - + [[hbase.offpeak.start.hour]] *`hbase.offpeak.start.hour`*:: + @@ -1143,7 +1143,7 @@ The start of off-peak hours, expressed as an integer between 0 and 23, inclusive .Default `-1` - + [[hbase.offpeak.end.hour]] *`hbase.offpeak.end.hour`*:: + @@ -1154,7 +1154,7 @@ The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. .Default `-1` - + [[hbase.regionserver.thread.compaction.throttle]] *`hbase.regionserver.thread.compaction.throttle`*:: + @@ -1170,19 +1170,19 @@ There are two different thread pools for compactions, one for large compactions .Default `2684354560` - + [[hbase.hstore.compaction.kv.max]] *`hbase.hstore.compaction.kv.max`*:: + .Description The maximum number of KeyValues to read and then write in a batch when flushing or compacting. Set this lower if you have big KeyValues and problems with Out Of Memory - Exceptions Set this higher if you have wide, small rows. + Exceptions Set this higher if you have wide, small rows. + .Default `10` - + [[hbase.storescanner.parallel.seek.enable]] *`hbase.storescanner.parallel.seek.enable`*:: + @@ -1194,7 +1194,7 @@ The maximum number of KeyValues to read and then write in a batch when flushing .Default `false` - + [[hbase.storescanner.parallel.seek.threads]] *`hbase.storescanner.parallel.seek.threads`*:: + @@ -1205,7 +1205,7 @@ The maximum number of KeyValues to read and then write in a batch when flushing .Default `10` - + [[hfile.block.cache.size]] *`hfile.block.cache.size`*:: + @@ -1218,7 +1218,7 @@ Percentage of maximum heap (-Xmx setting) to allocate to block cache .Default `0.4` - + [[hfile.block.index.cacheonwrite]] *`hfile.block.index.cacheonwrite`*:: + @@ -1229,7 +1229,7 @@ This allows to put non-root multi-level index blocks into the block .Default `false` - + [[hfile.index.block.max.size]] *`hfile.index.block.max.size`*:: + @@ -1241,31 +1241,31 @@ When the size of a leaf-level, intermediate-level, or root-level .Default `131072` - + [[hbase.bucketcache.ioengine]] *`hbase.bucketcache.ioengine`*:: + .Description -Where to store the contents of the bucketcache. One of: onheap, +Where to store the contents of the bucketcache. One of: onheap, offheap, or file. If a file, set it to file:PATH_TO_FILE. See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html for more information. - + + .Default `` - + [[hbase.bucketcache.combinedcache.enabled]] *`hbase.bucketcache.combinedcache.enabled`*:: + .Description -Whether or not the bucketcache is used in league with the LRU - on-heap block cache. In this mode, indices and blooms are kept in the LRU +Whether or not the bucketcache is used in league with the LRU + on-heap block cache. In this mode, indices and blooms are kept in the LRU blockcache and the data blocks are kept in the bucketcache. + .Default `true` - + [[hbase.bucketcache.size]] *`hbase.bucketcache.size`*:: + @@ -1276,19 +1276,19 @@ Used along with bucket cache, this is a float that EITHER represents a percentag .Default `0` when specified as a float - + [[hbase.bucketcache.sizes]] *`hbase.bucketcache.sizes`*:: + .Description -A comma-separated list of sizes for buckets for the bucketcache - if you use multiple sizes. Should be a list of block sizes in order from smallest +A comma-separated list of sizes for buckets for the bucketcache + if you use multiple sizes. Should be a list of block sizes in order from smallest to largest. The sizes you use will depend on your data access patterns. + .Default `` - + [[hfile.format.version]] *`hfile.format.version`*:: + @@ -1296,13 +1296,13 @@ A comma-separated list of sizes for buckets for the bucketcache The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Distributed Log Replay requires that tags are enabled. Also see the configuration - 'hbase.replication.rpc.codec'. - + 'hbase.replication.rpc.codec'. + + .Default `3` - + [[hfile.block.bloom.cacheonwrite]] *`hfile.block.bloom.cacheonwrite`*:: + @@ -1312,7 +1312,7 @@ Enables cache-on-write for inline blocks of a compound Bloom filter. .Default `false` - + [[io.storefile.bloom.block.size]] *`io.storefile.bloom.block.size`*:: + @@ -1325,7 +1325,7 @@ The size in bytes of a single block ("chunk") of a compound Bloom .Default `131072` - + [[hbase.rs.cacheblocksonwrite]] *`hbase.rs.cacheblocksonwrite`*:: + @@ -1336,7 +1336,7 @@ Whether an HFile block should be added to the block cache when the .Default `false` - + [[hbase.rpc.timeout]] *`hbase.rpc.timeout`*:: + @@ -1348,7 +1348,7 @@ This is for the RPC layer to define how long HBase client applications .Default `60000` - + [[hbase.rpc.shortoperation.timeout]] *`hbase.rpc.shortoperation.timeout`*:: + @@ -1361,7 +1361,7 @@ This is another version of "hbase.rpc.timeout". For those RPC operation .Default `10000` - + [[hbase.ipc.client.tcpnodelay]] *`hbase.ipc.client.tcpnodelay`*:: + @@ -1372,7 +1372,7 @@ Set no delay on rpc socket connections. See .Default `true` - + [[hbase.master.keytab.file]] *`hbase.master.keytab.file`*:: + @@ -1383,7 +1383,7 @@ Full path to the kerberos keytab file to use for logging in .Default `` - + [[hbase.master.kerberos.principal]] *`hbase.master.kerberos.principal`*:: + @@ -1397,7 +1397,7 @@ Ex. "hbase/[email protected]". The kerberos principal name .Default `` - + [[hbase.regionserver.keytab.file]] *`hbase.regionserver.keytab.file`*:: + @@ -1408,7 +1408,7 @@ Full path to the kerberos keytab file to use for logging in .Default `` - + [[hbase.regionserver.kerberos.principal]] *`hbase.regionserver.kerberos.principal`*:: + @@ -1423,7 +1423,7 @@ Ex. "hbase/[email protected]". The kerberos principal name .Default `` - + [[hadoop.policy.file]] *`hadoop.policy.file`*:: + @@ -1435,7 +1435,7 @@ The policy configuration file used by RPC servers to make .Default `hbase-policy.xml` - + [[hbase.superuser]] *`hbase.superuser`*:: + @@ -1447,7 +1447,7 @@ List of users or groups (comma-separated), who are allowed .Default `` - + [[hbase.auth.key.update.interval]] *`hbase.auth.key.update.interval`*:: + @@ -1458,7 +1458,7 @@ The update interval for master key for authentication tokens .Default `86400000` - + [[hbase.auth.token.max.lifetime]] *`hbase.auth.token.max.lifetime`*:: + @@ -1469,7 +1469,7 @@ The maximum lifetime in milliseconds after which an .Default `604800000` - + [[hbase.ipc.client.fallback-to-simple-auth-allowed]] *`hbase.ipc.client.fallback-to-simple-auth-allowed`*:: + @@ -1484,7 +1484,7 @@ When a client is configured to attempt a secure connection, but attempts to .Default `false` - + [[hbase.display.keys]] *`hbase.display.keys`*:: + @@ -1496,7 +1496,7 @@ When this is set to true the webUI and such will display all start/end keys .Default `true` - + [[hbase.coprocessor.region.classes]] *`hbase.coprocessor.region.classes`*:: + @@ -1510,7 +1510,7 @@ A comma-separated list of Coprocessors that are loaded by .Default `` - + [[hbase.rest.port]] *`hbase.rest.port`*:: + @@ -1520,7 +1520,7 @@ The port for the HBase REST server. .Default `8080` - + [[hbase.rest.readonly]] *`hbase.rest.readonly`*:: + @@ -1532,7 +1532,7 @@ Defines the mode the REST server will be started in. Possible values are: .Default `false` - + [[hbase.rest.threads.max]] *`hbase.rest.threads.max`*:: + @@ -1547,7 +1547,7 @@ The maximum number of threads of the REST server thread pool. .Default `100` - + [[hbase.rest.threads.min]] *`hbase.rest.threads.min`*:: + @@ -1559,7 +1559,7 @@ The minimum number of threads of the REST server thread pool. .Default `2` - + [[hbase.rest.support.proxyuser]] *`hbase.rest.support.proxyuser`*:: + @@ -1569,7 +1569,7 @@ Enables running the REST server to support proxy-user mode. .Default `false` - + [[hbase.defaults.for.version.skip]] *`hbase.defaults.for.version.skip`*:: + @@ -1585,7 +1585,7 @@ Set to true to skip the 'hbase.defaults.for.version' check. .Default `false` - + [[hbase.coprocessor.master.classes]] *`hbase.coprocessor.master.classes`*:: + @@ -1600,7 +1600,7 @@ A comma-separated list of .Default `` - + [[hbase.coprocessor.abortonerror]] *`hbase.coprocessor.abortonerror`*:: + @@ -1615,7 +1615,7 @@ Set to true to cause the hosting server (master or regionserver) .Default `true` - + [[hbase.online.schema.update.enable]] *`hbase.online.schema.update.enable`*:: + @@ -1625,7 +1625,7 @@ Set true to enable online schema changes. .Default `true` - + [[hbase.table.lock.enable]] *`hbase.table.lock.enable`*:: + @@ -1637,7 +1637,7 @@ Set to true to enable locking the table in zookeeper for schema change operation .Default `true` - + [[hbase.table.max.rowsize]] *`hbase.table.max.rowsize`*:: + @@ -1646,12 +1646,12 @@ Set to true to enable locking the table in zookeeper for schema change operation Maximum size of single row in bytes (default is 1 Gb) for Get'ting or Scan'ning without in-row scan flag set. If row size exceeds this limit RowTooBigException is thrown to client. - + + .Default `1073741824` - + [[hbase.thrift.minWorkerThreads]] *`hbase.thrift.minWorkerThreads`*:: + @@ -1662,7 +1662,7 @@ The "core size" of the thread pool. New threads are created on every .Default `16` - + [[hbase.thrift.maxWorkerThreads]] *`hbase.thrift.maxWorkerThreads`*:: + @@ -1674,7 +1674,7 @@ The maximum size of the thread pool. When the pending request queue .Default `1000` - + [[hbase.thrift.maxQueuedRequests]] *`hbase.thrift.maxQueuedRequests`*:: + @@ -1687,7 +1687,7 @@ The maximum number of pending Thrift connections waiting in the queue. If .Default `1000` - + [[hbase.thrift.htablepool.size.max]] *`hbase.thrift.htablepool.size.max`*:: + @@ -1696,12 +1696,12 @@ The upper bound for the table pool used in the Thrift gateways server. Since this is per table name, we assume a single table and so with 1000 default worker threads max this is set to a matching number. For other workloads this number can be adjusted as needed. - + + .Default `1000` - + [[hbase.regionserver.thrift.framed]] *`hbase.regionserver.thrift.framed`*:: + @@ -1710,12 +1710,12 @@ Use Thrift TFramedTransport on the server side. This is the recommended transport for thrift servers and requires a similar setting on the client side. Changing this to false will select the default transport, vulnerable to DoS when malformed requests are issued due to THRIFT-601. - + + .Default `false` - + [[hbase.regionserver.thrift.framed.max_frame_size_in_mb]] *`hbase.regionserver.thrift.framed.max_frame_size_in_mb`*:: + @@ -1725,7 +1725,7 @@ Default frame size when using framed transport .Default `2` - + [[hbase.regionserver.thrift.compact]] *`hbase.regionserver.thrift.compact`*:: + @@ -1735,7 +1735,7 @@ Use Thrift TCompactProtocol binary serialization protocol. .Default `false` - + [[hbase.data.umask.enable]] *`hbase.data.umask.enable`*:: + @@ -1746,7 +1746,7 @@ Enable, if true, that file permissions should be assigned .Default `false` - + [[hbase.data.umask]] *`hbase.data.umask`*:: + @@ -1757,7 +1757,7 @@ File permissions that should be used to write data .Default `000` - + [[hbase.metrics.showTableName]] *`hbase.metrics.showTableName`*:: + @@ -1770,7 +1770,7 @@ Whether to include the prefix "tbl.tablename" in per-column family metrics. .Default `true` - + [[hbase.metrics.exposeOperationTimes]] *`hbase.metrics.exposeOperationTimes`*:: + @@ -1782,7 +1782,7 @@ Whether to report metrics about time taken performing an .Default `true` - + [[hbase.snapshot.enabled]] *`hbase.snapshot.enabled`*:: + @@ -1792,7 +1792,7 @@ Set to true to allow snapshots to be taken / restored / cloned. .Default `true` - + [[hbase.snapshot.restore.take.failsafe.snapshot]] *`hbase.snapshot.restore.take.failsafe.snapshot`*:: + @@ -1804,7 +1804,7 @@ Set to true to take a snapshot before the restore operation. .Default `true` - + [[hbase.snapshot.restore.failsafe.name]] *`hbase.snapshot.restore.failsafe.name`*:: + @@ -1816,7 +1816,7 @@ Name of the failsafe snapshot taken by the restore operation. .Default `hbase-failsafe-{snapshot.name}-{restore.timestamp}` - + [[hbase.server.compactchecker.interval.multiplier]] *`hbase.server.compactchecker.interval.multiplier`*:: + @@ -1831,7 +1831,7 @@ The number that determines how often we scan to see if compaction is necessary. .Default `1000` - + [[hbase.lease.recovery.timeout]] *`hbase.lease.recovery.timeout`*:: + @@ -1841,7 +1841,7 @@ How long we wait on dfs lease recovery in total before giving up. .Default `900000` - + [[hbase.lease.recovery.dfs.timeout]] *`hbase.lease.recovery.dfs.timeout`*:: + @@ -1855,7 +1855,7 @@ How long between dfs recover lease invocations. Should be larger than the sum of .Default `64000` - + [[hbase.column.max.version]] *`hbase.column.max.version`*:: + @@ -1866,7 +1866,7 @@ New column family descriptors will use this value as the default number of versi .Default `1` - + [[hbase.dfs.client.read.shortcircuit.buffer.size]] *`hbase.dfs.client.read.shortcircuit.buffer.size`*:: + @@ -1880,12 +1880,12 @@ If the DFSClient configuration direct memory. So, we set it down from the default. Make it > the default hbase block size set in the HColumnDescriptor which is usually 64k. - + + .Default `131072` - + [[hbase.regionserver.checksum.verify]] *`hbase.regionserver.checksum.verify`*:: + @@ -1900,13 +1900,13 @@ If the DFSClient configuration fails, we will switch back to using HDFS checksums (so do not disable HDFS checksums! And besides this feature applies to hfiles only, not to WALs). If this parameter is set to false, then hbase will not verify any checksums, - instead it will depend on checksum verification being done in the HDFS client. - + instead it will depend on checksum verification being done in the HDFS client. + + .Default `true` - + [[hbase.hstore.bytes.per.checksum]] *`hbase.hstore.bytes.per.checksum`*:: + @@ -1914,12 +1914,12 @@ If the DFSClient configuration Number of bytes in a newly created checksum chunk for HBase-level checksums in hfile blocks. - + + .Default `16384` - + [[hbase.hstore.checksum.algorithm]] *`hbase.hstore.checksum.algorithm`*:: + @@ -1927,12 +1927,12 @@ If the DFSClient configuration Name of an algorithm that is used to compute checksums. Possible values are NULL, CRC32, CRC32C. - + + .Default `CRC32` - + [[hbase.status.published]] *`hbase.status.published`*:: + @@ -1942,60 +1942,60 @@ If the DFSClient configuration When a region server dies and its recovery starts, the master will push this information to the client application, to let them cut the connection immediately instead of waiting for a timeout. - + + .Default `false` - + [[hbase.status.publisher.class]] *`hbase.status.publisher.class`*:: + .Description Implementation of the status publication with a multicast message. - + + .Default `org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher` - + [[hbase.status.listener.class]] *`hbase.status.listener.class`*:: + .Description Implementation of the status listener with a multicast message. - + + .Default `org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener` - + [[hbase.status.multicast.address.ip]] *`hbase.status.multicast.address.ip`*:: + .Description Multicast address to use for the status publication by multicast. - + + .Default `226.1.1.3` - + [[hbase.status.multicast.address.port]] *`hbase.status.multicast.address.port`*:: + .Description Multicast port to use for the status publication by multicast. - + + .Default `16100` - + [[hbase.dynamic.jars.dir]] *`hbase.dynamic.jars.dir`*:: + @@ -2005,12 +2005,12 @@ If the DFSClient configuration dynamically by the region server without the need to restart. However, an already loaded filter/co-processor class would not be un-loaded. See HBASE-1936 for more details. - + + .Default `${hbase.rootdir}/lib` - + [[hbase.security.authentication]] *`hbase.security.authentication`*:: + @@ -2018,24 +2018,24 @@ If the DFSClient configuration Controls whether or not secure authentication is enabled for HBase. Possible values are 'simple' (no authentication), and 'kerberos'. - + + .Default `simple` - + [[hbase.rest.filter.classes]] *`hbase.rest.filter.classes`*:: + .Description Servlet filters for REST service. - + + .Default `org.apache.hadoop.hbase.rest.filter.GzipFilter` - + [[hbase.master.loadbalancer.class]] *`hbase.master.loadbalancer.class`*:: + @@ -2046,12 +2046,12 @@ If the DFSClient configuration http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html It replaces the DefaultLoadBalancer as the default (since renamed as the SimpleLoadBalancer). - + + .Default `org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer` - + [[hbase.security.exec.permission.checks]] *`hbase.security.exec.permission.checks`*:: + @@ -2067,28 +2067,28 @@ If the DFSClient configuration section of the HBase online manual. For more information on granting or revoking permissions using the AccessController, see the security section of the HBase online manual. - + + .Default `false` - + [[hbase.procedure.regionserver.classes]] *`hbase.procedure.regionserver.classes`*:: + .Description -A comma-separated list of - org.apache.hadoop.hbase.procedure.RegionServerProcedureManager procedure managers that are - loaded by default on the active HRegionServer process. The lifecycle methods (init/start/stop) - will be called by the active HRegionServer process to perform the specific globally barriered - procedure. After implementing your own RegionServerProcedureManager, just put it in +A comma-separated list of + org.apache.hadoop.hbase.procedure.RegionServerProcedureManager procedure managers that are + loaded by default on the active HRegionServer process. The lifecycle methods (init/start/stop) + will be called by the active HRegionServer process to perform the specific globally barriered + procedure. After implementing your own RegionServerProcedureManager, just put it in HBase's classpath and add the fully qualified class name here. - + + .Default `` - + [[hbase.procedure.master.classes]] *`hbase.procedure.master.classes`*:: + @@ -2103,7 +2103,7 @@ A comma-separated list of .Default `` - + [[hbase.coordinated.state.manager.class]] *`hbase.coordinated.state.manager.class`*:: + @@ -2113,7 +2113,7 @@ Fully qualified name of class implementing coordinated state manager. .Default `org.apache.hadoop.hbase.coordination.ZkCoordinatedStateManager` - + [[hbase.regionserver.storefile.refresh.period]] *`hbase.regionserver.storefile.refresh.period`*:: + @@ -2126,12 +2126,12 @@ Fully qualified name of class implementing coordinated state manager. extra Namenode pressure. If the files cannot be refreshed for longer than HFile TTL (hbase.master.hfilecleaner.ttl) the requests are rejected. Configuring HFile TTL to a larger value is also recommended with this setting. - + + .Default `0` - + [[hbase.region.replica.replication.enabled]] *`hbase.region.replica.replication.enabled`*:: + @@ -2142,33 +2142,33 @@ Fully qualified name of class implementing coordinated state manager. which will tail the logs and replicate the mutatations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. - Replication to secondary region replicas works over standard inter-cluster replication. - So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication" + Replication to secondary region replicas works over standard inter-cluster replication. + So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication" to true for this feature to work. - + + .Default `false` - + [[hbase.http.filter.initializers]] *`hbase.http.filter.initializers`*:: + .Description - A comma separated list of class names. Each class in the list must extend - org.apache.hadoop.hbase.http.FilterInitializer. The corresponding Filter will - be initialized. Then, the Filter will be applied to all user facing jsp - and servlet web pages. + A comma separated list of class names. Each class in the list must extend + org.apache.hadoop.hbase.http.FilterInitializer. The corresponding Filter will + be initialized. Then, the Filter will be applied to all user facing jsp + and servlet web pages. The ordering of the list defines the ordering of the filters. - The default StaticUserWebFilter add a user principal as defined by the + The default StaticUserWebFilter add a user principal as defined by the hbase.http.staticuser.user property. - + + .Default `org.apache.hadoop.hbase.http.lib.StaticUserWebFilter` - + [[hbase.security.visibility.mutations.checkauths]] *`hbase.security.visibility.mutations.checkauths`*:: + @@ -2176,41 +2176,41 @@ Fully qualified name of class implementing coordinated state manager. This property if enabled, will check whether the labels in the visibility expression are associated with the user issuing the mutation - + + .Default `false` - + [[hbase.http.max.threads]] *`hbase.http.max.threads`*:: + .Description - The maximum number of threads that the HTTP Server will create in its + The maximum number of threads that the HTTP Server will create in its ThreadPool. - + + .Default `10` - + [[hbase.replication.rpc.codec]] *`hbase.replication.rpc.codec`*:: + .Description The codec that is to be used when replication is enabled so that - the tags are also replicated. This is used along with HFileV3 which + the tags are also replicated. This is used along with HFileV3 which supports tags in them. If tags are not used or if the hfile version used is HFileV2 then KeyValueCodec can be used as the replication codec. Note that using KeyValueCodecWithTags for replication when there are no tags causes no harm. - + + .Default `org.apache.hadoop.hbase.codec.KeyValueCodecWithTags` - + [[hbase.http.staticuser.user]] *`hbase.http.staticuser.user`*:: + @@ -2219,12 +2219,12 @@ Fully qualified name of class implementing coordinated state manager. The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI (user to be used for browsing files). - + + .Default `dr.stack` - + [[hbase.regionserver.handler.abort.on.error.percent]] *`hbase.regionserver.handler.abort.on.error.percent`*:: +
http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/hbase_history.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbase_history.adoc b/src/main/asciidoc/_chapters/hbase_history.adoc index de4aff5..7308b90 100644 --- a/src/main/asciidoc/_chapters/hbase_history.adoc +++ b/src/main/asciidoc/_chapters/hbase_history.adoc @@ -29,9 +29,9 @@ :icons: font :experimental: -* 2006: link:http://research.google.com/archive/bigtable.html[BigTable] paper published by Google. -* 2006 (end of year): HBase development starts. -* 2008: HBase becomes Hadoop sub-project. -* 2010: HBase becomes Apache top-level project. +* 2006: link:http://research.google.com/archive/bigtable.html[BigTable] paper published by Google. +* 2006 (end of year): HBase development starts. +* 2008: HBase becomes Hadoop sub-project. +* 2010: HBase becomes Apache top-level project. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/hbck_in_depth.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/hbck_in_depth.adoc b/src/main/asciidoc/_chapters/hbck_in_depth.adoc index 1b30c59..3afbca0 100644 --- a/src/main/asciidoc/_chapters/hbck_in_depth.adoc +++ b/src/main/asciidoc/_chapters/hbck_in_depth.adoc @@ -29,7 +29,7 @@ :experimental: HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase. -It works in two basic modes -- a read-only inconsistency identifying mode and a multi-phase read-write repair mode. +It works in two basic modes -- a read-only inconsistency identifying mode and a multi-phase read-write repair mode. === Running hbck to identify inconsistencies @@ -45,7 +45,7 @@ At the end of the commands output it prints OK or tells you the number of INCONS You may also want to run run hbck a few times because some inconsistencies can be transient (e.g. cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g. via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected. -The using the `-details` option will report more details including a representative listing of all the splits present in all the tables. +The using the `-details` option will report more details including a representative listing of all the splits present in all the tables. [source,bourne] ---- @@ -66,9 +66,9 @@ $ ./bin/hbase hbck TableFoo TableBar === Inconsistencies If after several runs, inconsistencies continue to be reported, you may have encountered a corruption. -These should be rare, but in the event they occur newer versions of HBase include the hbck tool enabled with automatic repair options. +These should be rare, but in the event they occur newer versions of HBase include the hbck tool enabled with automatic repair options. -There are two invariants that when violated create inconsistencies in HBase: +There are two invariants that when violated create inconsistencies in HBase: * HBase's region consistency invariant is satisfied if every region is assigned and deployed on exactly one region server, and all places where this state kept is in accordance. * HBase's table integrity invariant is satisfied if for each table, every possible row key resolves to exactly one region. @@ -77,20 +77,20 @@ Repairs generally work in three phases -- a read-only information gathering phas Starting from version 0.90.0, hbck could detect region consistency problems report on a subset of possible table integrity problems. It also included the ability to automatically fix the most common inconsistency, region assignment and deployment consistency problems. This repair could be done by using the `-fix` command line option. -These problems close regions if they are open on the wrong server or on multiple region servers and also assigns regions to region servers if they are not open. +These problems close regions if they are open on the wrong server or on multiple region servers and also assigns regions to region servers if they are not open. Starting from HBase versions 0.90.7, 0.92.2 and 0.94.0, several new command line options are introduced to aid repairing a corrupted HBase. -This hbck sometimes goes by the nickname ``uberhbck''. Each particular version of uber hbck is compatible with the HBase's of the same major version (0.90.7 uberhbck can repair a 0.90.4). However, versions <=0.90.6 and versions <=0.92.1 may require restarting the master or failing over to a backup master. +This hbck sometimes goes by the nickname ``uberhbck''. Each particular version of uber hbck is compatible with the HBase's of the same major version (0.90.7 uberhbck can repair a 0.90.4). However, versions <=0.90.6 and versions <=0.92.1 may require restarting the master or failing over to a backup master. === Localized repairs When repairing a corrupted HBase, it is best to repair the lowest risk inconsistencies first. These are generally region consistency repairs -- localized single region repairs, that only modify in-memory data, ephemeral zookeeper data, or patch holes in the META table. Region consistency requires that the HBase instance has the state of the region's data in HDFS (.regioninfo files), the region's row in the hbase:meta table., and region's deployment/assignments on region servers and the master in accordance. -Options for repairing region consistency include: +Options for repairing region consistency include: * `-fixAssignments` (equivalent to the 0.90 `-fix` option) repairs unassigned, incorrectly assigned or multiply assigned regions. -* `-fixMeta` which removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if they regions are present in HDFS while not in META. To fix deployment and assignment problems you can run this command: +* `-fixMeta` which removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if they regions are present in HDFS while not in META. To fix deployment and assignment problems you can run this command: [source,bourne] ---- @@ -205,8 +205,8 @@ However, there could be some lingering offline split parents sometimes. They are in META, in HDFS, and not deployed. But HBase can't clean them up. In this case, you can use the `-fixSplitParents` option to reset them in META to be online and not split. -Therefore, hbck can merge them with other regions if fixing overlapping regions option is used. +Therefore, hbck can merge them with other regions if fixing overlapping regions option is used. -This option should not normally be used, and it is not in `-fixAll`. +This option should not normally be used, and it is not in `-fixAll`. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/other_info.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/other_info.adoc b/src/main/asciidoc/_chapters/other_info.adoc index 046b747..6143876 100644 --- a/src/main/asciidoc/_chapters/other_info.adoc +++ b/src/main/asciidoc/_chapters/other_info.adoc @@ -31,50 +31,50 @@ [[other.info.videos]] === HBase Videos -.Introduction to HBase -* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). -* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010). -link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011). +.Introduction to HBase +* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). +* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010). +link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011). -link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010). +link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010). [[other.info.pres]] === HBase Presentations (Slides) -link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011). +link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011). -link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). +link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). -link:http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install[Getting The Most From Your HBase Install] by Ryan Rawson, Jonathan Gray (Hadoop World 2009). +link:http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install[Getting The Most From Your HBase Install] by Ryan Rawson, Jonathan Gray (Hadoop World 2009). [[other.info.papers]] === HBase Papers -link:http://research.google.com/archive/bigtable.html[BigTable] by Google (2006). +link:http://research.google.com/archive/bigtable.html[BigTable] by Google (2006). -link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS Locality] by Lars George (2010). +link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS Locality] by Lars George (2010). -link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases] by Ian Varley (2009). +link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases] by Ian Varley (2009). [[other.info.sites]] === HBase Sites -link:http://www.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information. +link:http://www.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information. -* link:http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems. +* link:http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems. -link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations. +link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations. -link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone. +link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone. [[other.info.books]] === HBase Books -link:http://shop.oreilly.com/product/0636920014348.do[HBase: The Definitive Guide] by Lars George. +link:http://shop.oreilly.com/product/0636920014348.do[HBase: The Definitive Guide] by Lars George. [[other.info.books.hadoop]] === Hadoop Books -link:http://shop.oreilly.com/product/9780596521981.do[Hadoop: The Definitive Guide] by Tom White. +link:http://shop.oreilly.com/product/9780596521981.do[Hadoop: The Definitive Guide] by Tom White. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/performance.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc index bf0e790..c68d882 100644 --- a/src/main/asciidoc/_chapters/performance.adoc +++ b/src/main/asciidoc/_chapters/performance.adoc @@ -102,12 +102,12 @@ Are all the network interfaces functioning correctly? Are you sure? See the Trou [[perf.network.call_me_maybe]] === Network Consistency and Partition Tolerance -The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics: -- *C*onsistency -- all nodes see the same data. +The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics: +- *C*onsistency -- all nodes see the same data. - *A*vailability -- every request receives a response about whether it succeeded or failed. - *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others. -HBase favors consistency and partition tolerance, where a decision has to be made. Coda Hale explains why partition tolerance is so important, in http://codahale.com/you-cant-sacrifice-partition-tolerance/. +HBase favors consistency and partition tolerance, where a decision has to be made. Coda Hale explains why partition tolerance is so important, in http://codahale.com/you-cant-sacrifice-partition-tolerance/. Robert Yokota used an automated testing framework called link:https://aphyr.com/tags/jepsen[Jepson] to test HBase's partition tolerance in the face of network partitions, using techniques modeled after Aphyr's link:https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions[Call Me Maybe] series. The results, available as a link:https://rayokota.wordpress.com/2015/09/30/call-me-maybe-hbase/[blog post] and an link:https://rayokota.wordpress.com/2015/09/30/call-me-maybe-hbase-addendum/[addendum], show that HBase performs correctly. @@ -782,7 +782,8 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation. For a large number of deletes, consider `Table.delete(List)`. -See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29 +See ++++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29">hbase.client.Delete</a>+++. [[perf.hdfs]] == HDFS http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/rpc.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/rpc.adoc b/src/main/asciidoc/_chapters/rpc.adoc index c319f39..ee53795 100644 --- a/src/main/asciidoc/_chapters/rpc.adoc +++ b/src/main/asciidoc/_chapters/rpc.adoc @@ -47,7 +47,7 @@ For more background on how we arrived at this spec., see link:https://docs.googl . A wire-format we can evolve -. A format that does not require our rewriting server core or radically changing its current architecture (for later). +. A format that does not require our rewriting server core or radically changing its current architecture (for later). === TODO @@ -58,7 +58,7 @@ For more background on how we arrived at this spec., see link:https://docs.googl . Diagram on how it works . A grammar that succinctly describes the wire-format. Currently we have these words and the content of the rpc protobuf idl but a grammar for the back and forth would help with groking rpc. - Also, a little state machine on client/server interactions would help with understanding (and ensuring correct implementation). + Also, a little state machine on client/server interactions would help with understanding (and ensuring correct implementation). === RPC @@ -79,7 +79,7 @@ link:https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=blob;f=hbase-protocol Client initiates connection. ===== Client -On connection setup, client sends a preamble followed by a connection header. +On connection setup, client sends a preamble followed by a connection header. .<preamble> [source] @@ -191,7 +191,7 @@ Doing header+param rather than a single protobuf Message with both header and pa . Is closer to what we currently have . Having a single fat pb requires extra copying putting the already pb'd param into the body of the fat request pb (and same making result) . We can decide whether to accept the request or not before we read the param; for example, the request might be low priority. - As is, we read header+param in one go as server is currently implemented so this is a TODO. + As is, we read header+param in one go as server is currently implemented so this is a TODO. The advantages are minor. If later, fat request has clear advantage, can roll out a v2 later. @@ -205,13 +205,13 @@ Codec must implement hbase's `Codec` Interface. After connection setup, all passed cellblocks will be sent with this codec. The server will return cellblocks using this same codec as long as the codec is on the servers' CLASSPATH (else you will get `UnsupportedCellCodecException`). -To change the default codec, set `hbase.client.default.rpc.codec`. +To change the default codec, set `hbase.client.default.rpc.codec`. To disable cellblocks completely and to go pure protobuf, set the default to the empty String and do not specify a codec in your Configuration. So, set `hbase.client.default.rpc.codec` to the empty string and do not set `hbase.client.rpc.codec`. This will cause the client to connect to the server with no codec specified. If a server sees no codec, it will return all responses in pure protobuf. -Running pure protobuf all the time will be slower than running with cellblocks. +Running pure protobuf all the time will be slower than running with cellblocks. .Compression Uses hadoops compression codecs. http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/schema_design.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index a212a5c..f2ed234 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -733,10 +733,12 @@ Composite Rowkey With Numeric Substitution: For this approach another lookup table would be needed in addition to LOG_DATA, called LOG_TYPES. The rowkey of LOG_TYPES would be: -* [type] (e.g., byte indicating hostname vs. event-type) -* [bytes] variable length bytes for raw hostname or event-type. +* `[type]` (e.g., byte indicating hostname vs. event-type) +* `[bytes]` variable length bytes for raw hostname or event-type. -A column for this rowkey could be a long with an assigned number, which could be obtained by using an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29[HBase counter]. +A column for this rowkey could be a long with an assigned number, which could be obtained +by using an ++++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29">HBase counter</a>+++. So the resulting composite rowkey would be: @@ -751,7 +753,9 @@ In either the Hash or Numeric substitution approach, the raw values for hostname This effectively is the OpenTSDB approach. What OpenTSDB does is re-write data and pack rows into columns for certain time-periods. -For a detailed explanation, see: link:http://opentsdb.net/schema.html, and link:http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html[Lessons Learned from OpenTSDB] from HBaseCon2012. +For a detailed explanation, see: link:http://opentsdb.net/schema.html, and ++++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++ +from HBaseCon2012. But this is how the general concept works: data is ingested, for example, in this manner... @@ -854,14 +858,14 @@ The ORDER table's rowkey was described above: <<schema.casestudies.custorder,sch The SHIPPING_LOCATION's composite rowkey would be something like this: -* [order-rowkey] -* [shipping location number] (e.g., 1st location, 2nd, etc.) +* `[order-rowkey]` +* `[shipping location number]` (e.g., 1st location, 2nd, etc.) The LINE_ITEM table's composite rowkey would be something like this: -* [order-rowkey] -* [shipping location number] (e.g., 1st location, 2nd, etc.) -* [line item number] (e.g., 1st lineitem, 2nd, etc.) +* `[order-rowkey]` +* `[shipping location number]` (e.g., 1st location, 2nd, etc.) +* `[line item number]` (e.g., 1st lineitem, 2nd, etc.) Such a normalized model is likely to be the approach with an RDBMS, but that's not your only option with HBase. The cons of such an approach is that to retrieve information about any Order, you will need: @@ -879,21 +883,21 @@ With this approach, there would exist a single table ORDER that would contain The Order rowkey was described above: <<schema.casestudies.custorder,schema.casestudies.custorder>> -* [order-rowkey] -* [ORDER record type] +* `[order-rowkey]` +* `[ORDER record type]` The ShippingLocation composite rowkey would be something like this: -* [order-rowkey] -* [SHIPPING record type] -* [shipping location number] (e.g., 1st location, 2nd, etc.) +* `[order-rowkey]` +* `[SHIPPING record type]` +* `[shipping location number]` (e.g., 1st location, 2nd, etc.) The LineItem composite rowkey would be something like this: -* [order-rowkey] -* [LINE record type] -* [shipping location number] (e.g., 1st location, 2nd, etc.) -* [line item number] (e.g., 1st lineitem, 2nd, etc.) +* `[order-rowkey]` +* `[LINE record type]` +* `[shipping location number]` (e.g., 1st location, 2nd, etc.) +* `[line item number]` (e.g., 1st lineitem, 2nd, etc.) [[schema.casestudies.custorder.obj.denorm]] ===== Denormalized @@ -902,9 +906,9 @@ A variant of the Single Table With Record Types approach is to denormalize and f The LineItem composite rowkey would be something like this: -* [order-rowkey] -* [LINE record type] -* [line item number] (e.g., 1st lineitem, 2nd, etc., care must be taken that there are unique across the entire order) +* `[order-rowkey]` +* `[LINE record type]` +* `[line item number]` (e.g., 1st lineitem, 2nd, etc., care must be taken that there are unique across the entire order) and the LineItem columns would be something like this: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/security.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc index fb2a6b0..acc23d7 100644 --- a/src/main/asciidoc/_chapters/security.adoc +++ b/src/main/asciidoc/_chapters/security.adoc @@ -1332,11 +1332,21 @@ static Table createTableAndWriteDataWithLabels(TableName tableName, String... la ---- ==== -<<reading_cells_with_labels>> +[[reading_cells_with_labels]] ==== Reading Cells with Labels -When you issue a Scan or Get, HBase uses your default set of authorizations to filter out cells that you do not have access to. A superuser can set the default set of authorizations for a given user by using the `set_auths` HBase Shell command or the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.hbase.client.Connection,%20java.lang.String[],%20java.lang.String)[VisibilityClient.setAuths()] method. -You can specify a different authorization during the Scan or Get, by passing the AUTHORIZATIONS option in HBase Shell, or the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()] method if you use the API. This authorization will be combined with your default set as an additional filter. It will further filter your results, rather than giving you additional authorization. +When you issue a Scan or Get, HBase uses your default set of authorizations to +filter out cells that you do not have access to. A superuser can set the default +set of authorizations for a given user by using the `set_auths` HBase Shell command +or the +link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.hbase.client.Connection,%20java.lang.String\[\],%20java.lang.String)[VisibilityClient.setAuths()] method. + +You can specify a different authorization during the Scan or Get, by passing the +AUTHORIZATIONS option in HBase Shell, or the +link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()] +method if you use the API. This authorization will be combined with your default +set as an additional filter. It will further filter your results, rather than +giving you additional authorization. .HBase Shell ==== @@ -1582,8 +1592,10 @@ Rotate the Master Key:: === Secure Bulk Load Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase. -Secure bulk loading is implemented by a coprocessor, named link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html -[SecureBulkLoadEndpoint], which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to _/tmp/hbase-staging/_. +Secure bulk loading is implemented by a coprocessor, named +link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint], +which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to +_/tmp/hbase-staging/_. .Secure Bulk Load Algorithm http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/tracing.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/tracing.adoc b/src/main/asciidoc/_chapters/tracing.adoc index 9b3711e..0cddd8a 100644 --- a/src/main/asciidoc/_chapters/tracing.adoc +++ b/src/main/asciidoc/_chapters/tracing.adoc @@ -31,12 +31,12 @@ :experimental: link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://htrace.incubator.apache.org/[HTrace]. -Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement). +Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement). [[tracing.spanreceivers]] === SpanReceivers -The tracing system works by collecting information in structures called 'Spans'. It is up to you to choose how you want to receive this information by implementing the `SpanReceiver` interface, which defines one method: +The tracing system works by collecting information in structures called 'Spans'. It is up to you to choose how you want to receive this information by implementing the `SpanReceiver` interface, which defines one method: [source] ---- @@ -45,12 +45,12 @@ public void receiveSpan(Span span); ---- This method serves as a callback whenever a span is completed. -HTrace allows you to use as many SpanReceivers as you want so you can easily send trace information to multiple destinations. +HTrace allows you to use as many SpanReceivers as you want so you can easily send trace information to multiple destinations. -Configure what SpanReceivers you'd like to us by putting a comma separated list of the fully-qualified class name of classes implementing `SpanReceiver` in _hbase-site.xml_ property: `hbase.trace.spanreceiver.classes`. +Configure what SpanReceivers you'd like to us by putting a comma separated list of the fully-qualified class name of classes implementing `SpanReceiver` in _hbase-site.xml_ property: `hbase.trace.spanreceiver.classes`. HTrace includes a `LocalFileSpanReceiver` that writes all span information to local files in a JSON-based format. -The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-file-span-receiver.path` property with a value describing the name of the file to which nodes should write their span information. +The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-file-span-receiver.path` property with a value describing the name of the file to which nodes should write their span information. [source] ---- @@ -65,7 +65,7 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi </property> ---- -HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. +HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster. _htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes. @@ -77,11 +77,11 @@ _htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7 <property> <name>hbase.trace.spanreceiver.classes</name> <value>org.apache.htrace.impl.ZipkinSpanReceiver</value> -</property> +</property> <property> <name>hbase.htrace.zipkin.collector-hostname</name> <value>localhost</value> -</property> +</property> <property> <name>hbase.htrace.zipkin.collector-port</name> <value>9410</value> @@ -93,7 +93,7 @@ If you do not want to use the included span receivers, you are encouraged to wri [[tracing.client.modifications]] == Client Modifications -In order to turn on tracing in your client code, you must initialize the module sending spans to receiver once per client process. +In order to turn on tracing in your client code, you must initialize the module sending spans to receiver once per client process. [source,java] ---- @@ -107,7 +107,7 @@ private SpanReceiverHost spanReceiverHost; ---- Then you simply start tracing span before requests you think are interesting, and close it when the request is done. -For example, if you wanted to trace all of your get operations, you change this: +For example, if you wanted to trace all of your get operations, you change this: [source,java] ---- @@ -118,7 +118,7 @@ Get get = new Get(Bytes.toBytes("r1")); Result res = table.get(get); ---- -into: +into: [source,java] ---- @@ -133,7 +133,7 @@ try { } ---- -If you wanted to trace half of your 'get' operations, you would pass in: +If you wanted to trace half of your 'get' operations, you would pass in: [source,java] ---- @@ -142,12 +142,12 @@ new ProbabilitySampler(0.5) ---- in lieu of `Sampler.ALWAYS` to `Trace.startSpan()`. -See the HTrace _README_ for more information on Samplers. +See the HTrace _README_ for more information on Samplers. [[tracing.client.shell]] == Tracing from HBase Shell -You can use `trace` command for tracing requests from HBase Shell. `trace 'start'` command turns on tracing and `trace 'stop'` command turns off tracing. +You can use `trace` command for tracing requests from HBase Shell. `trace 'start'` command turns on tracing and `trace 'stop'` command turns off tracing. [source] ---- @@ -158,7 +158,7 @@ hbase(main):003:0> trace 'stop' ---- `trace 'start'` and `trace 'stop'` always returns boolean value representing if or not there is ongoing tracing. -As a result, `trace 'stop'` returns false on success. `trace 'status'` just returns if or not tracing is turned on. +As a result, `trace 'stop'` returns false on success. `trace 'status'` just returns if or not tracing is turned on. [source] ---- http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/unit_testing.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/unit_testing.adoc b/src/main/asciidoc/_chapters/unit_testing.adoc index 3f70001..ded237a 100644 --- a/src/main/asciidoc/_chapters/unit_testing.adoc +++ b/src/main/asciidoc/_chapters/unit_testing.adoc @@ -47,7 +47,7 @@ public class MyHBaseDAO { Put put = createPut(obj); table.put(put); } - + private static Put createPut(HBaseTestObj obj) { Put put = new Put(Bytes.toBytes(obj.getRowKey())); put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"), @@ -96,7 +96,7 @@ public class TestMyHbaseDAOData { These tests ensure that your `createPut` method creates, populates, and returns a `Put` object with expected values. Of course, JUnit can do much more than this. -For an introduction to JUnit, see link:https://github.com/junit-team/junit/wiki/Getting-started. +For an introduction to JUnit, see link:https://github.com/junit-team/junit/wiki/Getting-started. == Mockito @@ -133,7 +133,7 @@ public class TestMyHBaseDAO{ Configuration config = HBaseConfiguration.create(); @Mock Connection connection = ConnectionFactory.createConnection(config); - @Mock + @Mock private Table table; @Captor private ArgumentCaptor putCaptor; @@ -150,7 +150,7 @@ public class TestMyHBaseDAO{ MyHBaseDAO.insertRecord(table, obj); verify(table).put(putCaptor.capture()); Put put = putCaptor.getValue(); - + assertEquals(Bytes.toString(put.getRow()), obj.getRowKey()); assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"))); assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2"))); @@ -197,7 +197,7 @@ public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable> } ---- -To test this code, the first step is to add a dependency to MRUnit to your Maven POM file. +To test this code, the first step is to add a dependency to MRUnit to your Maven POM file. [source,xml] ---- @@ -225,16 +225,16 @@ public class MyReducerTest { MyReducer reducer = new MyReducer(); reduceDriver = ReduceDriver.newReduceDriver(reducer); } - + @Test public void testHBaseInsert() throws IOException { - String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1", + String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1", strValue2 = "DATA2"; List<Text> list = new ArrayList<Text>(); list.add(new Text(strValue)); list.add(new Text(strValue1)); list.add(new Text(strValue2)); - //since in our case all that the reducer is doing is appending the records that the mapper + //since in our case all that the reducer is doing is appending the records that the mapper //sends it, we should get the following back String expectedOutput = strValue + strValue1 + strValue2; //Setup Input, mimic what mapper would have passed @@ -242,10 +242,10 @@ strValue2 = "DATA2"; reduceDriver.withInput(new Text(strKey), list); //run the reducer and get its output List<Pair<ImmutableBytesWritable, Writable>> result = reduceDriver.run(); - + //extract key from result and verify assertEquals(Bytes.toString(result.get(0).getFirst().get()), strKey); - + //extract value for CF/QUALIFIER and verify Put a = (Put)result.get(0).getSecond(); String c = Bytes.toString(a.get(CF, QUALIFIER).get(0).getValue()); @@ -283,7 +283,7 @@ Check the versions to be sure they are appropriate. <type>test-jar</type> <scope>test</scope> </dependency> - + <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> @@ -309,7 +309,7 @@ public class MyHBaseIntegrationTest { private static HBaseTestingUtility utility; byte[] CF = "CF".getBytes(); byte[] QUALIFIER = "CQ-1".getBytes(); - + @Before public void setup() throws Exception { utility = new HBaseTestingUtility(); @@ -343,7 +343,7 @@ This code creates an HBase mini-cluster and starts it. Next, it creates a table called `MyTest` with one column family, `CF`. A record is inserted, a Get is performed from the same table, and the insertion is verified. -NOTE: Starting the mini-cluster takes about 20-30 seconds, but that should be appropriate for integration testing. +NOTE: Starting the mini-cluster takes about 20-30 seconds, but that should be appropriate for integration testing. To use an HBase mini-cluster on Microsoft Windows, you need to use a Cygwin environment.
