HBASE-14823 HBase Ref Guide Refactoring Some tables, links, and other output do not render right in the output, either because of Asciidoc code mistakes or the wrong formatting choices. Make improvements.
Project: http://git-wip-us.apache.org/repos/asf/hbase/repo Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/623dc130 Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/623dc130 Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/623dc130 Branch: refs/heads/hbase-12439 Commit: 623dc1303eee55610659f9e7e5a4a9149630adfe Parents: 1b13bfc Author: Misty Stanley-Jones <[email protected]> Authored: Tue Nov 17 11:14:56 2015 +1000 Committer: Misty Stanley-Jones <[email protected]> Committed: Wed Nov 18 14:14:37 2015 +1000 ---------------------------------------------------------------------- .../asciidoc/_chapters/appendix_acl_matrix.adoc | 2 +- .../_chapters/appendix_hfile_format.adoc | 7 +- src/main/asciidoc/_chapters/architecture.adoc | 184 ++++--- src/main/asciidoc/_chapters/asf.adoc | 4 +- src/main/asciidoc/_chapters/community.adoc | 34 +- src/main/asciidoc/_chapters/compression.adoc | 36 +- src/main/asciidoc/_chapters/configuration.adoc | 16 +- src/main/asciidoc/_chapters/datamodel.adoc | 2 +- src/main/asciidoc/_chapters/faq.adoc | 20 +- .../asciidoc/_chapters/getting_started.adoc | 2 +- src/main/asciidoc/_chapters/hbase-default.adoc | 514 +++++++++---------- src/main/asciidoc/_chapters/hbase_history.adoc | 8 +- src/main/asciidoc/_chapters/hbck_in_depth.adoc | 20 +- src/main/asciidoc/_chapters/other_info.adoc | 34 +- src/main/asciidoc/_chapters/performance.adoc | 9 +- src/main/asciidoc/_chapters/rpc.adoc | 12 +- src/main/asciidoc/_chapters/schema_design.adoc | 46 +- src/main/asciidoc/_chapters/security.adoc | 22 +- src/main/asciidoc/_chapters/tracing.adoc | 30 +- src/main/asciidoc/_chapters/unit_testing.adoc | 26 +- src/main/asciidoc/_chapters/zookeeper.adoc | 28 +- 21 files changed, 566 insertions(+), 490 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/appendix_acl_matrix.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/appendix_acl_matrix.adoc b/src/main/asciidoc/_chapters/appendix_acl_matrix.adoc index cb285f3..698ae82 100644 --- a/src/main/asciidoc/_chapters/appendix_acl_matrix.adoc +++ b/src/main/asciidoc/_chapters/appendix_acl_matrix.adoc @@ -65,7 +65,7 @@ Possible permissions include the following: For the most part, permissions work in an expected way, with the following caveats: Having Write permission does not imply Read permission.:: - It is possible and sometimes desirable for a user to be able to write data that same user cannot read. One such example is a log-writing process. + It is possible and sometimes desirable for a user to be able to write data that same user cannot read. One such example is a log-writing process. The [systemitem]+hbase:meta+ table is readable by every user, regardless of the user's other grants or restrictions.:: This is a requirement for HBase to function correctly. `CheckAndPut` and `CheckAndDelete` operations will fail if the user does not have both Write and Read permission.:: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/appendix_hfile_format.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc index d73ddfb..1fdf99f 100644 --- a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc +++ b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc @@ -192,8 +192,11 @@ This format applies to intermediate-level and leaf index blocks of a version 2 m Every non-root index block is structured as follows. . numEntries: the number of entries (int). -. entryOffsets: the ``secondary index'' of offsets of entries in the block, to facilitate a quick binary search on the key (numEntries + 1 int values). The last value is the total length of all entries in this index block. - For example, in a non-root index block with entry sizes 60, 80, 50 the ``secondary index'' will contain the following int array: {0, 60, 140, 190}. +. entryOffsets: the "secondary index" of offsets of entries in the block, to facilitate + a quick binary search on the key (`numEntries + 1` int values). The last value + is the total length of all entries in this index block. For example, in a non-root + index block with entry sizes 60, 80, 50 the "secondary index" will contain the + following int array: `{0, 60, 140, 190}`. . Entries. Each entry contains: + http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/architecture.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc index 8122e11..6580719 100644 --- a/src/main/asciidoc/_chapters/architecture.adoc +++ b/src/main/asciidoc/_chapters/architecture.adoc @@ -140,7 +140,7 @@ If a region has both an empty start and an empty end key, it is the only region In the (hopefully unlikely) event that programmatic processing of catalog metadata is required, see the -link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29[Writables] ++++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</a>+++ utility. [[arch.catalog.startup]] @@ -931,7 +931,7 @@ To configure MultiWAL for a RegionServer, set the value of the property `hbase.w </property> ---- -Restart the RegionServer for the changes to take effect. +Restart the RegionServer for the changes to take effect. To disable MultiWAL for a RegionServer, unset the property and restart the RegionServer. @@ -1806,60 +1806,116 @@ This list is not exhaustive. To tune these parameters from the defaults, edit the _hbase-default.xml_ file. For a full list of all configuration parameters available, see <<config.files,config.files>> -[cols="1,1a,1", options="header"] -|=== -| Parameter -| Description -| Default - -|`hbase.hstore.compaction.min` -| The minimum number of StoreFiles which must be eligible for compaction before compaction can run. The goal of tuning `hbase.hstore.compaction.min` is to avoid ending up with too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In previous versions of HBase, the parameter hbase.hstore.compaction.min was called `hbase.hstore.compactionThreshold`. -|3 - -|`hbase.hstore.compaction.max` -| The maximum number of StoreFiles which will be selected for a single minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the length of time it takes a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most cases, the default value is appropriate. -|10 - -|`hbase.hstore.compaction.min.size` -| A StoreFile smaller than this size will always be eligible for minor compaction. StoreFiles this size or larger are evaluated by `hbase.hstore.compaction.ratio` to determine if they are eligible. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be reduced in write-heavy environments where many files in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is triggered more quickly. This addressed some issues seen in earlier versions of HBase but changing this parameter is no longer necessary in most situations. -|128 MB - -|`hbase.hstore.compaction.max.size` -| An StoreFile larger than this size will be excluded from compaction. The effect of raising `hbase.hstore.compaction.max.size` is fewer, larger StoreFiles that do not get compacted often. If you feel that compaction is happening too often without much benefit, you can try raising this value. -|`Long.MAX_VALUE` - -|`hbase.hstore.compaction.ratio` -| For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than `hbase.hstore.compaction.min.size` is eligible for compaction. Its effect is to limit compaction of large StoreFile. The value of `hbase.hstore.compaction.ratio` is expressed as a floating-point decimal. - -* A large ratio, such as 10, will produce a single giant StoreFile. Conversely, a value of .25, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. -* A moderate value of between 1.0 and 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. Raising the value (to something like 1.4) will have more write costs, because you will compact larger StoreFiles. However, during reads, HBase will need to seek through fewer StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of <<bloom>>. -* Alternatively, you can lower this value to something like 1.0 to reduce the background cost of writes, and use to limit the number of StoreFiles touched during reads. For most cases, the default value is appropriate. -| `1.2F` - -|`hbase.hstore.compaction.ratio.offpeak` -| The compaction ratio used during off-peak compactions, if off-peak hours are also configured (see below). Expressed as a floating-point decimal. This allows for more aggressive (or less aggressive, if you set it lower than `hbase.hstore.compaction.ratio`) compaction during a set time period. Ignored if off-peak is disabled (default). This works the same as hbase.hstore.compaction.ratio. -| `5.0F` +`hbase.hstore.compaction.min`:: + The minimum number of StoreFiles which must be eligible for compaction before compaction can run. + The goal of tuning `hbase.hstore.compaction.min` is to avoid ending up with too many tiny StoreFiles + to compact. Setting this value to 2 would cause a minor compaction each time you have two StoreFiles + in a Store, and this is probably not appropriate. If you set this value too high, all the other + values will need to be adjusted accordingly. For most cases, the default value is appropriate. + In previous versions of HBase, the parameter `hbase.hstore.compaction.min` was called + `hbase.hstore.compactionThreshold`. ++ +*Default*: 3 + +`hbase.hstore.compaction.max`:: + The maximum number of StoreFiles which will be selected for a single minor compaction, + regardless of the number of eligible StoreFiles. Effectively, the value of + `hbase.hstore.compaction.max` controls the length of time it takes a single + compaction to complete. Setting it larger means that more StoreFiles are included + in a compaction. For most cases, the default value is appropriate. ++ +*Default*: 10 + +`hbase.hstore.compaction.min.size`:: + A StoreFile smaller than this size will always be eligible for minor compaction. + StoreFiles this size or larger are evaluated by `hbase.hstore.compaction.ratio` + to determine if they are eligible. Because this limit represents the "automatic + include" limit for all StoreFiles smaller than this value, this value may need + to be reduced in write-heavy environments where many files in the 1-2 MB range + are being flushed, because every StoreFile will be targeted for compaction and + the resulting StoreFiles may still be under the minimum size and require further + compaction. If this parameter is lowered, the ratio check is triggered more quickly. + This addressed some issues seen in earlier versions of HBase but changing this + parameter is no longer necessary in most situations. ++ +*Default*:128 MB -| `hbase.offpeak.start.hour` -| The start of off-peak hours, expressed as an integer between 0 and 23, inclusive. Set to -1 to disable off-peak. -| `-1` (disabled) +`hbase.hstore.compaction.max.size`:: + A StoreFile larger than this size will be excluded from compaction. The effect of + raising `hbase.hstore.compaction.max.size` is fewer, larger StoreFiles that do not + get compacted often. If you feel that compaction is happening too often without + much benefit, you can try raising this value. ++ +*Default*: `Long.MAX_VALUE` -| `hbase.offpeak.end.hour` -| The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. Set to -1 to disable off-peak. -| `-1` (disabled) +`hbase.hstore.compaction.ratio`:: + For minor compaction, this ratio is used to determine whether a given StoreFile + which is larger than `hbase.hstore.compaction.min.size` is eligible for compaction. + Its effect is to limit compaction of large StoreFile. The value of + `hbase.hstore.compaction.ratio` is expressed as a floating-point decimal. ++ +* A large ratio, such as 10, will produce a single giant StoreFile. Conversely, + a value of .25, will produce behavior similar to the BigTable compaction algorithm, + producing four StoreFiles. +* A moderate value of between 1.0 and 1.4 is recommended. When tuning this value, + you are balancing write costs with read costs. Raising the value (to something like + 1.4) will have more write costs, because you will compact larger StoreFiles. + However, during reads, HBase will need to seek through fewer StoreFiles to + accomplish the read. Consider this approach if you cannot take advantage of <<bloom>>. +* Alternatively, you can lower this value to something like 1.0 to reduce the + background cost of writes, and use to limit the number of StoreFiles touched + during reads. For most cases, the default value is appropriate. ++ +*Default*: `1.2F` + +`hbase.hstore.compaction.ratio.offpeak`:: + The compaction ratio used during off-peak compactions, if off-peak hours are + also configured (see below). Expressed as a floating-point decimal. This allows + for more aggressive (or less aggressive, if you set it lower than + `hbase.hstore.compaction.ratio`) compaction during a set time period. Ignored + if off-peak is disabled (default). This works the same as + `hbase.hstore.compaction.ratio`. ++ +*Default*: `5.0F` -| `hbase.regionserver.thread.compaction.throttle` -| There are two different thread pools for compactions, one for large compactions and the other for small compactions. This helps to keep compaction of lean tables (such as `hbase:meta`) fast. If a compaction is larger than this threshold, it goes into the large compaction pool. In most cases, the default value is appropriate. -| `2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size` (which defaults to `128`) +`hbase.offpeak.start.hour`:: + The start of off-peak hours, expressed as an integer between 0 and 23, inclusive. + Set to -1 to disable off-peak. ++ +*Default*: `-1` (disabled) -| `hbase.hregion.majorcompaction` -| Time between major compactions, expressed in milliseconds. Set to 0 to disable time-based automatic major compactions. User-requested and size-based major compactions will still run. This value is multiplied by `hbase.hregion.majorcompaction.jitter` to cause compaction to start at a somewhat-random time during a given window of time. -| 7 days (`604800000` milliseconds) +`hbase.offpeak.end.hour`:: + The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. + Set to -1 to disable off-peak. ++ +*Default*: `-1` (disabled) + +`hbase.regionserver.thread.compaction.throttle`:: + There are two different thread pools for compactions, one for large compactions + and the other for small compactions. This helps to keep compaction of lean tables + (such as `hbase:meta`) fast. If a compaction is larger than this threshold, + it goes into the large compaction pool. In most cases, the default value is + appropriate. ++ +*Default*: `2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size` +(which defaults to `128`) + +`hbase.hregion.majorcompaction`:: + Time between major compactions, expressed in milliseconds. Set to 0 to disable + time-based automatic major compactions. User-requested and size-based major + compactions will still run. This value is multiplied by + `hbase.hregion.majorcompaction.jitter` to cause compaction to start at a + somewhat-random time during a given window of time. ++ +*Default*: 7 days (`604800000` milliseconds) -| `hbase.hregion.majorcompaction.jitter` -| A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occur a given amount of time either side of `hbase.hregion.majorcompaction`. The smaller the number, the closer the compactions will happen to the `hbase.hregion.majorcompaction` interval. Expressed as a floating-point decimal. -| `.50F` -|=== +`hbase.hregion.majorcompaction.jitter`:: + A multiplier applied to hbase.hregion.majorcompaction to cause compaction to + occur a given amount of time either side of `hbase.hregion.majorcompaction`. + The smaller the number, the closer the compactions will happen to the + `hbase.hregion.majorcompaction` interval. Expressed as a floating-point decimal. ++ +*Default*: `.50F` [[compaction.file.selection.old]] ===== Compaction File Selection @@ -2308,18 +2364,18 @@ To serve the region data from multiple replicas, HBase opens the regions in seco The regions opened in secondary mode will share the same data files with the primary region replica, however each secondary region replica will have its own MemStore to keep the unflushed data (only primary region can do flushes). Also to serve reads from secondary regions, the blocks of data files may be also cached in the block caches for the secondary regions. === Where is the code -This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items. +This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items. === Propagating writes to region replicas -As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommeded. +As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommeded. ==== StoreFile Refresher -The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time. +The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time. -For turning this feature on, you should configure `hbase.regionserver.storefile.refresh.period` to a non-zero value. See Configuration section below. +For turning this feature on, you should configure `hbase.regionserver.storefile.refresh.period` to a non-zero value. See Configuration section below. ==== Asnyc WAL replication -The second mechanism for propagation of writes to secondaries is done via âAsync WAL Replicationâ feature and is only available in HBase-1.1+. This works similarly to HBaseâs multi-datacenter replication, but instead the data from a region is replicated to the secondary regions. Each secondary replica always receives and observes the writes in the same order that the primary region committed them. In some sense, this design can be thought of as âin-cluster replicationâ, where instead of replicating to a different datacenter, the data goes to secondary regions to keep secondary regionâs in-memory state up to date. The data files are shared between the primary region and the other replicas, so that there is no extra storage overhead. However, the secondary regions will have recent non-flushed data in their memstores, which increases the memory overhead. The primary region writes flush, compaction, and bulk load events to its WAL as well, which are also replicated through w al replication to secondaries. When they observe the flush/compaction or bulk load event, the secondary regions replay the event to pick up the new files and drop the old ones. +The second mechanism for propagation of writes to secondaries is done via âAsync WAL Replicationâ feature and is only available in HBase-1.1+. This works similarly to HBaseâs multi-datacenter replication, but instead the data from a region is replicated to the secondary regions. Each secondary replica always receives and observes the writes in the same order that the primary region committed them. In some sense, this design can be thought of as âin-cluster replicationâ, where instead of replicating to a different datacenter, the data goes to secondary regions to keep secondary regionâs in-memory state up to date. The data files are shared between the primary region and the other replicas, so that there is no extra storage overhead. However, the secondary regions will have recent non-flushed data in their memstores, which increases the memory overhead. The primary region writes flush, compaction, and bulk load events to its WAL as well, which are also replicated through w al replication to secondaries. When they observe the flush/compaction or bulk load event, the secondary regions replay the event to pick up the new files and drop the old ones. Committing writes in the same order as in primary ensures that the secondaries wonât diverge from the primary regions data, but since the log replication is asynchronous, the data might still be stale in secondary regions. Since this feature works as a replication endpoint, the performance and latency characteristics is expected to be similar to inter-cluster replication. @@ -2332,18 +2388,18 @@ Asyn WAL Replication feature will add a new replication peer named `region_repli hbase> disable_peer 'region_replica_replication' ---- -=== Store File TTL -In both of the write propagation approaches mentioned above, store files of the primary will be opened in secondaries independent of the primary region. So for files that the primary compacted away, the secondaries might still be referring to these files for reading. Both features are using HFileLinks to refer to files, but there is no protection (yet) for guaranteeing that the file will not be deleted prematurely. Thus, as a guard, you should set the configuration property `hbase.master.hfilecleaner.ttl` to a larger value, such as 1 hour to guarantee that you will not receive IOExceptions for requests going to replicas. +=== Store File TTL +In both of the write propagation approaches mentioned above, store files of the primary will be opened in secondaries independent of the primary region. So for files that the primary compacted away, the secondaries might still be referring to these files for reading. Both features are using HFileLinks to refer to files, but there is no protection (yet) for guaranteeing that the file will not be deleted prematurely. Thus, as a guard, you should set the configuration property `hbase.master.hfilecleaner.ttl` to a larger value, such as 1 hour to guarantee that you will not receive IOExceptions for requests going to replicas. === Region replication for META tableâs region -Currently, Async WAL Replication is not done for the META tableâs WAL. The meta tableâs secondary replicas still refreshes themselves from the persistent store files. Hence the `hbase.regionserver.meta.storefile.refresh.period` needs to be set to a certain non-zero value for refreshing the meta store files. Note that this configuration is configured differently than -`hbase.regionserver.storefile.refresh.period`. +Currently, Async WAL Replication is not done for the META tableâs WAL. The meta tableâs secondary replicas still refreshes themselves from the persistent store files. Hence the `hbase.regionserver.meta.storefile.refresh.period` needs to be set to a certain non-zero value for refreshing the meta store files. Note that this configuration is configured differently than +`hbase.regionserver.storefile.refresh.period`. === Memory accounting The secondary region replicas refer to the data files of the primary region replica, but they have their own memstores (in HBase-1.1+) and uses block cache as well. However, one distinction is that the secondary region replicas cannot flush the data when there is memory pressure for their memstores. They can only free up memstore memory when the primary region does a flush and this flush is replicated to the secondary. Since in a region server hosting primary replicas for some regions and secondaries for some others, the secondaries might cause extra flushes to the primary regions in the same host. In extreme situations, there can be no memory left for adding new writes coming from the primary via wal replication. For unblocking this situation (and since secondary cannot flush by itself), the secondary is allowed to do a âstore file refreshâ by doing a file system list operation to pick up new files from primary, and possibly dropping its memstore. This refresh will only be perf ormed if the memstore size of the biggest secondary region replica is at least `hbase.region.replica.storefile.refresh.memstore.multiplier` (default 4) times bigger than the biggest memstore of a primary replica. One caveat is that if this is performed, the secondary can observe partial row updates across column families (since column families are flushed independently). The default should be good to not do this operation frequently. You can set this value to a large number to disable this feature if desired, but be warned that it might cause the replication to block forever. === Secondary replica failover -When a secondary region replica first comes online, or fails over, it may have served some edits from itâs memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a âregion open eventâ replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message âThe region's reads are disabledâ. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this featur e if needed. +When a secondary region replica first comes online, or fails over, it may have served some edits from itâs memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a âregion open eventâ replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message âThe region's reads are disabledâ. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this featur e if needed. @@ -2352,7 +2408,7 @@ When a secondary region replica first comes online, or fails over, it may have s To use highly available reads, you should set the following properties in `hbase-site.xml` file. There is no specific configuration to enable or disable region replicas. -Instead you can change the number of region replicas per table to increase or decrease at the table creation or with alter table. The following configuration is for using async wal replication and using meta replicas of 3. +Instead you can change the number of region replicas per table to increase or decrease at the table creation or with alter table. The following configuration is for using async wal replication and using meta replicas of 3. ==== Server side properties @@ -2413,7 +2469,7 @@ Instead you can change the number of region replicas per table to increase or de </property> -<property> +<property> <name>hbase.region.replica.storefile.refresh.memstore.multiplier</name> <value>4</value> <description> @@ -2476,7 +2532,7 @@ Ensure to set the following for all clients (and servers) that will use region r </property> ---- -Note HBase-1.0.x users should use `hbase.ipc.client.allowsInterrupt` rather than `hbase.ipc.client.specificThreadForWriting`. +Note HBase-1.0.x users should use `hbase.ipc.client.allowsInterrupt` rather than `hbase.ipc.client.specificThreadForWriting`. === User Interface http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/asf.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/asf.adoc b/src/main/asciidoc/_chapters/asf.adoc index 77eed8f..47c29e5 100644 --- a/src/main/asciidoc/_chapters/asf.adoc +++ b/src/main/asciidoc/_chapters/asf.adoc @@ -35,13 +35,13 @@ HBase is a project in the Apache Software Foundation and as such there are respo [[asf.devprocess]] === ASF Development Process -See the link:http://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF. +See the link:http://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF. [[asf.reporting]] === ASF Board Reporting Once a quarter, each project in the ASF portfolio submits a report to the ASF board. This is done by the HBase project lead and the committers. -See link:http://www.apache.org/foundation/board/reporting[ASF board reporting] for more information. +See link:http://www.apache.org/foundation/board/reporting[ASF board reporting] for more information. :numbered: http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/community.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/community.adoc b/src/main/asciidoc/_chapters/community.adoc index 573fb49..b4c84ca 100644 --- a/src/main/asciidoc/_chapters/community.adoc +++ b/src/main/asciidoc/_chapters/community.adoc @@ -45,18 +45,18 @@ See link:http://search-hadoop.com/m/asM982C5FkS1[HBase, mail # dev - Thoughts The below policy is something we put in place 09/2012. It is a suggested policy rather than a hard requirement. -We want to try it first to see if it works before we cast it in stone. +We want to try it first to see if it works before we cast it in stone. Apache HBase is made of link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components]. Components have one or more <<owner,OWNER>>s. -See the 'Description' field on the link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components] JIRA page for who the current owners are by component. +See the 'Description' field on the link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components] JIRA page for who the current owners are by component. Patches that fit within the scope of a single Apache HBase component require, at least, a +1 by one of the component's owners before commit. -If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice. +If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice. -Patches that span components need at least two +1s before they can be committed, preferably +1s by owners of components touched by the x-component patch (TODO: This needs tightening up but I think fine for first pass). +Patches that span components need at least two +1s before they can be committed, preferably +1s by owners of components touched by the x-component patch (TODO: This needs tightening up but I think fine for first pass). -Any -1 on a patch by anyone vetos a patch; it cannot be committed until the justification for the -1 is addressed. +Any -1 on a patch by anyone vetos a patch; it cannot be committed until the justification for the -1 is addressed. [[hbase.fix.version.in.jira]] .How to set fix version in JIRA on issue resolve @@ -67,13 +67,13 @@ If master is going to be 0.98.0 then: * Commit only to master: Mark with 0.98 * Commit to 0.95 and master: Mark with 0.98, and 0.95.x * Commit to 0.94.x and 0.95, and master: Mark with 0.98, 0.95.x, and 0.94.x -* Commit to 89-fb: Mark with 89-fb. -* Commit site fixes: no version +* Commit to 89-fb: Mark with 89-fb. +* Commit site fixes: no version [[hbase.when.to.close.jira]] .Policy on when to set a RESOLVED JIRA as CLOSED -We link:http://search-hadoop.com/m/4cIKs1iwXMS1[agreed] that for issues that list multiple releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions listed; subsequent change to the issue must happen in a new JIRA. +We link:http://search-hadoop.com/m/4cIKs1iwXMS1[agreed] that for issues that list multiple releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions listed; subsequent change to the issue must happen in a new JIRA. [[no.permanent.state.in.zk]] .Only transient state in ZooKeeper! @@ -81,7 +81,7 @@ We link:http://search-hadoop.com/m/4cIKs1iwXMS1[agreed] that for issues that lis You should be able to kill the data in zookeeper and hbase should ride over it recreating the zk content as it goes. This is an old adage around these parts. We just made note of it now. -We also are currently in violation of this basic tenet -- replication at least keeps permanent state in zk -- but we are working to undo this breaking of a golden rule. +We also are currently in violation of this basic tenet -- replication at least keeps permanent state in zk -- but we are working to undo this breaking of a golden rule. [[community.roles]] == Community Roles @@ -90,22 +90,22 @@ We also are currently in violation of this basic tenet -- replication at least k .Component Owner/Lieutenant Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components] page. -The owners are listed in the 'Description' field rather than in the 'Component Lead' field because the latter only allows us list one individual whereas it is encouraged that components have multiple owners. +The owners are listed in the 'Description' field rather than in the 'Component Lead' field because the latter only allows us list one individual whereas it is encouraged that components have multiple owners. -Owners or component lieutenants are volunteers who are (usually, but not necessarily) expert in their component domain and may have an agenda on how they think their Apache HBase component should evolve. +Owners or component lieutenants are volunteers who are (usually, but not necessarily) expert in their component domain and may have an agenda on how they think their Apache HBase component should evolve. -. Owners will try and review patches that land within their component's scope. -. If applicable, if an owner has an agenda, they will publish their goals or the design toward which they are driving their component +. Owners will try and review patches that land within their component's scope. +. If applicable, if an owner has an agenda, they will publish their goals or the design toward which they are driving their component If you would like to be volunteer as a component owner, just write the dev list and we'll sign you up. -Owners do not need to be committers. +Owners do not need to be committers. [[hbase.commit.msg.format]] == Commit Message format -We link:http://search-hadoop.com/m/Gwxwl10cFHa1[agreed] to the following Git commit message format: +We link:http://search-hadoop.com/m/Gwxwl10cFHa1[agreed] to the following Git commit message format: [source] ---- HBASE-xxxxx <title>. (<contributor>) ----- -If the person making the commit is the contributor, leave off the '(<contributor>)' element. +---- +If the person making the commit is the contributor, leave off the '(<contributor>)' element. http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/compression.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/compression.adoc b/src/main/asciidoc/_chapters/compression.adoc index 42d4de5..228e883 100644 --- a/src/main/asciidoc/_chapters/compression.adoc +++ b/src/main/asciidoc/_chapters/compression.adoc @@ -144,15 +144,15 @@ In general, you need to weigh your options between smaller size and faster compr The Hadoop shared library has a bunch of facility including compression libraries and fast crc'ing. To make this facility available to HBase, do the following. HBase/Hadoop will fall back to use alternatives if it cannot find the native library versions -- or fail outright if you asking for an explicit compressor and there is no alternative available. -If you see the following in your HBase logs, you know that HBase was unable to locate the Hadoop native libraries: +If you see the following in your HBase logs, you know that HBase was unable to locate the Hadoop native libraries: [source] ---- 2014-08-07 09:26:20,139 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ----- -If the libraries loaded successfully, the WARN message does not show. +---- +If the libraries loaded successfully, the WARN message does not show. Lets presume your Hadoop shipped with a native library that suits the platform you are running HBase on. -To check if the Hadoop native library is available to HBase, run the following tool (available in Hadoop 2.1 and greater): +To check if the Hadoop native library is available to HBase, run the following tool (available in Hadoop 2.1 and greater): [source] ---- $ ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker @@ -165,7 +165,7 @@ lz4: false bzip2: false 2014-08-26 13:15:38,863 INFO [main] util.ExitUtil: Exiting with status 1 ---- -Above shows that the native hadoop library is not available in HBase context. +Above shows that the native hadoop library is not available in HBase context. To fix the above, either copy the Hadoop native libraries local or symlink to them if the Hadoop and HBase stalls are adjacent in the filesystem. You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable. @@ -173,20 +173,20 @@ You could also point at their location by setting the `LD_LIBRARY_PATH` environm Where the JVM looks to find native librarys is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM` is the label for the platform your HBase is installed on. On a local linux machine, it seems to be the concatenation of the java properties `os.name` and `os.arch` followed by whether 32 or 64 bit. HBase on startup prints out all of the java system properties so find the os.name and os.arch in the log. -For example: +For example: [source] ---- ... 2014-08-06 15:27:22,853 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux 2014-08-06 15:27:22,853 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64 ... ----- +---- So in this case, the PLATFORM string is `Linux-amd64-64`. Copying the Hadoop native libraries or symlinking at _lib/native/Linux-amd64-64_ will ensure they are found. Check with the Hadoop _NativeLibraryChecker_. - -Here is example of how to point at the Hadoop libs with `LD_LIBRARY_PATH` environment variable: + +Here is example of how to point at the Hadoop libs with `LD_LIBRARY_PATH` environment variable: [source] ---- $ LD_LIBRARY_PATH=~/hadoop-2.5.0-SNAPSHOT/lib/native ./bin/hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker @@ -199,7 +199,7 @@ snappy: true /usr/lib64/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib64/libbz2.so.1 ---- -Set in _hbase-env.sh_ the LD_LIBRARY_PATH environment variable when starting your HBase. +Set in _hbase-env.sh_ the LD_LIBRARY_PATH environment variable when starting your HBase. === Compressor Configuration, Installation, and Use @@ -210,13 +210,13 @@ Before HBase can use a given compressor, its libraries need to be available. Due to licensing issues, only GZ compression is available to HBase (via native Java libraries) in a default installation. Other compression libraries are available via the shared library bundled with your hadoop. The hadoop native library needs to be findable when HBase starts. -See +See .Compressor Support On the Master A new configuration setting was introduced in HBase 0.95, to check the Master to determine which data block encoders are installed and configured on it, and assume that the entire cluster is configured the same. This option, `hbase.master.check.compression`, defaults to `true`. -This prevents the situation described in link:https://issues.apache.org/jira/browse/HBASE-6370[HBASE-6370], where a table is created or modified to support a codec that a region server does not support, leading to failures that take a long time to occur and are difficult to debug. +This prevents the situation described in link:https://issues.apache.org/jira/browse/HBASE-6370[HBASE-6370], where a table is created or modified to support a codec that a region server does not support, leading to failures that take a long time to occur and are difficult to debug. If `hbase.master.check.compression` is enabled, libraries for all desired compressors need to be installed and configured on the Master, even if the Master does not run a region server. @@ -232,7 +232,7 @@ See <<brand.new.compressor,brand.new.compressor>>). HBase cannot ship with LZO because of incompatibility between HBase, which uses an Apache Software License (ASL) and LZO, which uses a GPL license. See the link:http://wiki.apache.org/hadoop/UsingLzoCompression[Using LZO - Compression] wiki page for information on configuring LZO support for HBase. + Compression] wiki page for information on configuring LZO support for HBase. If you depend upon LZO compression, consider configuring your RegionServers to fail to start if LZO is not available. See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>. @@ -244,19 +244,19 @@ LZ4 support is bundled with Hadoop. Make sure the hadoop shared library (libhadoop.so) is accessible when you start HBase. After configuring your platform (see <<hbase.native.platform,hbase.native.platform>>), you can make a symbolic link from HBase to the native Hadoop libraries. This assumes the two software installs are colocated. -For example, if my 'platform' is Linux-amd64-64: +For example, if my 'platform' is Linux-amd64-64: [source,bourne] ---- $ cd $HBASE_HOME $ mkdir lib/native $ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-64 ----- +---- Use the compression tool to check that LZ4 is installed on all nodes. Start up (or restart) HBase. -Afterward, you can create and alter tables to enable LZ4 as a compression codec.: +Afterward, you can create and alter tables to enable LZ4 as a compression codec.: ---- hbase(main):003:0> alter 'TestTable', {NAME => 'info', COMPRESSION => 'LZ4'} ----- +---- [[snappy.compression.installation]] .Install Snappy Support @@ -347,7 +347,7 @@ You must specify either `-write` or `-update-read` as your first parameter, and ==== ---- -$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h +$ bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -h usage: bin/hbase org.apache.hadoop.hbase.util.LoadTestTool <options> Options: -batchupdate Whether to use batch as opposed to separate http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/configuration.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc index 5a4a6ec..084c47c 100644 --- a/src/main/asciidoc/_chapters/configuration.adoc +++ b/src/main/asciidoc/_chapters/configuration.adoc @@ -564,7 +564,7 @@ If you are running a distributed operation, be sure to wait until HBase has shut === _hbase-site.xml_ and _hbase-default.xml_ Just as in Hadoop where you add site-specific HDFS configuration to the _hdfs-site.xml_ file, for HBase, site specific customizations go into the file _conf/hbase-site.xml_. -For the list of configurable properties, see <<hbase_default_configurations,hbase default configurations>> below or view the raw _hbase-default.xml_ source file in the HBase source code at _src/main/resources_. +For the list of configurable properties, see <<hbase_default_configurations,hbase default configurations>> below or view the raw _hbase-default.xml_ source file in the HBase source code at _src/main/resources_. Not all configuration options make it out to _hbase-default.xml_. Configuration that it is thought rare anyone would change can exist only in code; the only way to turn up such configurations is via a reading of the source code itself. @@ -572,7 +572,7 @@ Configuration that it is thought rare anyone would change can exist only in code Currently, changes here will require a cluster restart for HBase to notice the change. // hbase/src/main/asciidoc // -include::../../../../target/asciidoc/hbase-default.adoc[] +include::{docdir}/../../../target/asciidoc/hbase-default.adoc[] [[hbase.env.sh]] @@ -604,7 +604,7 @@ ZooKeeper is where all these values are kept. Thus clients require the location of the ZooKeeper ensemble before they can do anything else. Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`. -If you are configuring an IDE to run a HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests). +If you are configuring an IDE to run a HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests). Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including: [source] @@ -621,7 +621,7 @@ slf4j-log4j (slf4j-log4j12-1.5.8.jar) zookeeper (zookeeper-3.4.2.jar) ---- -An example basic _hbase-site.xml_ for client only might look as follows: +An example basic _hbase-site.xml_ for client only might look as follows: [source,xml] ---- <?xml version="1.0"?> @@ -1002,7 +1002,7 @@ See the link:http://docs.oracle.com/javase/6/docs/technotes/guides/management/ag Historically, besides above port mentioned, JMX opens two additional random TCP listening ports, which could lead to port conflict problem. (See link:https://issues.apache.org/jira/browse/HBASE-10289[HBASE-10289] for details) As an alternative, You can use the coprocessor-based JMX implementation provided by HBase. -To enable it in 0.99 or above, add below property in _hbase-site.xml_: +To enable it in 0.99 or above, add below property in _hbase-site.xml_: [source,xml] ---- @@ -1033,7 +1033,7 @@ The registry port can be shared with connector port in most cases, so you only n However if you want to use SSL communication, the 2 ports must be configured to different values. By default the password authentication and SSL communication is disabled. -To enable password authentication, you need to update _hbase-env.sh_ like below: +To enable password authentication, you need to update _hbase-env.sh_ like below: [source,bash] ---- export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.authenticate=true \ @@ -1060,7 +1060,7 @@ keytool -export -alias jconsole -keystore myKeyStore -file jconsole.cert keytool -import -alias jconsole -keystore jconsoleKeyStore -file jconsole.cert ---- -And then update _hbase-env.sh_ like below: +And then update _hbase-env.sh_ like below: [source,bash] ---- @@ -1082,7 +1082,7 @@ Finally start `jconsole` on the client using the key store: jconsole -J-Djavax.net.ssl.trustStore=/home/tianq/jconsoleKeyStore ---- -NOTE: To enable the HBase JMX implementation on Master, you also need to add below property in _hbase-site.xml_: +NOTE: To enable the HBase JMX implementation on Master, you also need to add below property in _hbase-site.xml_: [source,xml] ---- http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/datamodel.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc index 7e9f0ab..646b68b 100644 --- a/src/main/asciidoc/_chapters/datamodel.adoc +++ b/src/main/asciidoc/_chapters/datamodel.adoc @@ -93,7 +93,7 @@ The colon character (`:`) delimits the column family from the column family _qua |=== |Row Key |Time Stamp |ColumnFamily `contents` |ColumnFamily `anchor`|ColumnFamily `people` |"com.cnn.www" |t9 | |anchor:cnnsi.com = "CNN" | -|"com.cnn.www" |t8 | |anchor:my.look.ca = "CNN.com" | +|"com.cnn.www" |t8 | |anchor:my.look.ca = "CNN.com" | |"com.cnn.www" |t6 | contents:html = "<html>..." | | |"com.cnn.www" |t5 | contents:html = "<html>..." | | |"com.cnn.www" |t3 | contents:html = "<html>..." | | http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/faq.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/faq.adoc b/src/main/asciidoc/_chapters/faq.adoc index 22e4ad3..6729978 100644 --- a/src/main/asciidoc/_chapters/faq.adoc +++ b/src/main/asciidoc/_chapters/faq.adoc @@ -55,18 +55,18 @@ How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?:: <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>0.98.5-hadoop2</version> -</dependency> ----- -+ -.Maven Dependency for HBase 0.96 +</dependency> +---- ++ +.Maven Dependency for HBase 0.96 [source,xml] ---- <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>0.96.2-hadoop2</version> -</dependency> ----- +</dependency> +---- + .Maven Dependency for HBase 0.94 [source,xml] @@ -75,9 +75,9 @@ How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?:: <groupId>org.apache.hbase</groupId> <artifactId>hbase</artifactId> <version>0.94.3</version> -</dependency> ----- - +</dependency> +---- + === Architecture How does HBase handle Region-RegionServer assignment and locality?:: @@ -91,7 +91,7 @@ Where can I learn about the rest of the configuration options?:: See <<configuration>>. === Schema Design / Data Access - + How should I design my schema in HBase?:: See <<datamodel>> and <<schema>>. http://git-wip-us.apache.org/repos/asf/hbase/blob/623dc130/src/main/asciidoc/_chapters/getting_started.adoc ---------------------------------------------------------------------- diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc index 4134831..1b38e6e 100644 --- a/src/main/asciidoc/_chapters/getting_started.adoc +++ b/src/main/asciidoc/_chapters/getting_started.adoc @@ -57,7 +57,7 @@ Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. U .Example /etc/hosts File for Ubuntu ==== -The following _/etc/hosts_ file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble. +The following _/etc/hosts_ file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble. [listing] ---- 127.0.0.1 localhost
