HBASE-20354 better docs for impact of proactively checking hsync support.

* Add to the quickstart guide disabling the hsync check, with a
  big warning about how we'll lose data if the check is disabled.
* Add to troubleshooting section so folks searching can get a pointer.

Signed-off-by: Mike Drob <md...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/8014c5c3
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/8014c5c3
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/8014c5c3

Branch: refs/heads/HBASE-19064
Commit: 8014c5c3ac510e81592ba1b4193d2fd4177a991c
Parents: de57d08
Author: Sean Busbey <bus...@apache.org>
Authored: Thu Apr 5 11:52:00 2018 -0500
Committer: Sean Busbey <bus...@apache.org>
Committed: Fri Apr 6 13:34:04 2018 -0500

----------------------------------------------------------------------
 .../asciidoc/_chapters/getting_started.adoc     | 46 +++++++++++++--
 src/main/asciidoc/_chapters/shell.adoc          |  6 +-
 .../asciidoc/_chapters/troubleshooting.adoc     | 62 ++++++++++++++++++++
 3 files changed, 107 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/8014c5c3/src/main/asciidoc/_chapters/getting_started.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/getting_started.adoc 
b/src/main/asciidoc/_chapters/getting_started.adoc
index 9e0bbdf..0edddfa 100644
--- a/src/main/asciidoc/_chapters/getting_started.adoc
+++ b/src/main/asciidoc/_chapters/getting_started.adoc
@@ -82,7 +82,7 @@ JAVA_HOME=/usr
 +
 
 . Edit _conf/hbase-site.xml_, which is the main HBase configuration file.
-  At this time, you only need to specify the directory on the local filesystem 
where HBase and ZooKeeper write data.
+  At this time, you need to specify the directory on the local filesystem 
where HBase and ZooKeeper write data and acknowledge some risks.
   By default, a new directory is created under /tmp.
   Many servers are configured to delete the contents of _/tmp_ upon reboot, so 
you should store the data elsewhere.
   The following configuration will store HBase's data in the _hbase_ 
directory, in the home directory of the user called `testuser`.
@@ -102,6 +102,21 @@ JAVA_HOME=/usr
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/home/testuser/zookeeper</value>
   </property>
+  <property>
+    <name>hbase.unsafe.stream.capability.enforce</name>
+    <value>false</value>
+    <description>
+      Controls whether HBase will check for stream capabilities (hflush/hsync).
+
+      Disable this if you intend to run on LocalFileSystem, denoted by a 
rootdir
+      with the 'file://' scheme, but be mindful of the NOTE below.
+
+      WARNING: Setting this to false blinds you to potential data loss and
+      inconsistent system state in the event of process and/or node failures. 
If
+      HBase is complaining of an inability to use hsync or hflush it's most
+      likely not a false positive.
+    </description>
+  </property>
 </configuration>
 ----
 ====
@@ -111,7 +126,14 @@ HBase will do this for you.  If you create the directory,
 HBase will attempt to do a migration, which is not what you want.
 +
 NOTE: The _hbase.rootdir_ in the above example points to a directory
-in the _local filesystem_. The 'file:/' prefix is how we denote local 
filesystem.
+in the _local filesystem_. The 'file://' prefix is how we denote local
+filesystem. You should take the WARNING present in the configuration example
+to heart. In standalone mode HBase makes use of the local filesystem 
abstraction
+from the Apache Hadoop project. That abstraction doesn't provide the durability
+promises that HBase needs to operate safely. This is fine for local development
+and testing use cases where the cost of cluster failure is well contained. It 
is
+not appropriate for production deployments; eventually you will lose data.
+
 To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to 
point at a
 directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_.
 For more on this variant, see the section below on Standalone HBase over HDFS.
@@ -163,7 +185,7 @@ hbase(main):001:0> create 'test', 'cf'
 
 . List Information About your Table
 +
-Use the `list` command to
+Use the `list` command to confirm your table exists
 +
 ----
 hbase(main):002:0> list 'test'
@@ -174,6 +196,22 @@ test
 => ["test"]
 ----
 
++
+Now use the `describe` command to see details, including configuration defaults
++
+----
+hbase(main):003:0> describe 'test'
+Table test is ENABLED
+test
+COLUMN FAMILIES DESCRIPTION
+{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', 
NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
CACHE_DATA_ON_WRITE =>
+'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
+alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE
+ => '65536'}
+1 row(s)
+Took 0.9998 seconds
+----
+
 . Put data into your table.
 +
 To put data into your table, use the `put` command.
@@ -314,7 +352,7 @@ First, add the following property which directs HBase to 
run in distributed mode
 ----
 +
 Next, change the `hbase.rootdir` from the local filesystem to the address of 
your HDFS instance, using the `hdfs:////` URI syntax.
-In this example, HDFS is running on the localhost at port 8020.
+In this example, HDFS is running on the localhost at port 8020. Be sure to 
either remove the entry for `hbase.unsafe.stream.capability.enforce` or set it 
to true.
 +
 [source,xml]
 ----

http://git-wip-us.apache.org/repos/asf/hbase/blob/8014c5c3/src/main/asciidoc/_chapters/shell.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/shell.adoc 
b/src/main/asciidoc/_chapters/shell.adoc
index 13b8dd1..b246ab1 100644
--- a/src/main/asciidoc/_chapters/shell.adoc
+++ b/src/main/asciidoc/_chapters/shell.adoc
@@ -227,7 +227,7 @@ The table reference can be used to perform data read write 
operations such as pu
 For example, previously you would always specify a table name:
 
 ----
-hbase(main):000:0> create ‘t’, ‘f’
+hbase(main):000:0> create 't', 'f'
 0 row(s) in 1.0970 seconds
 hbase(main):001:0> put 't', 'rold', 'f', 'v'
 0 row(s) in 0.0080 seconds
@@ -291,7 +291,7 @@ hbase(main):012:0> tab = get_table 't'
 0 row(s) in 0.0010 seconds
 
 => Hbase::Table - t
-hbase(main):013:0> tab.put ‘r1’ ,’f’, ‘v’
+hbase(main):013:0> tab.put 'r1' ,'f', 'v'
 0 row(s) in 0.0100 seconds
 hbase(main):014:0> tab.scan
 ROW                                COLUMN+CELL
@@ -305,7 +305,7 @@ You can then use jruby to script table operations based on 
these names.
 The list_snapshots command also acts similarly.
 
 ----
-hbase(main):016 > tables = list(‘t.*’)
+hbase(main):016 > tables = list('t.*')
 TABLE
 t
 1 row(s) in 0.1040 seconds

http://git-wip-us.apache.org/repos/asf/hbase/blob/8014c5c3/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc 
b/src/main/asciidoc/_chapters/troubleshooting.adoc
index eb62b33..58880c6 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -941,6 +941,45 @@ java.lang.UnsatisfiedLinkError: no gplcompression in 
java.library.path
 \... then there is a path issue with the compression libraries.
 See the Configuration section on link:[LZO compression configuration].
 
+[[trouble.rs.startup.hsync]]
+==== RegionServer aborts due to lack of hsync for filesystem
+
+In order to provide data durability for writes to the cluster HBase relies on 
the ability to durably save state in a write ahead log. When using a version of 
Apache Hadoop Common's filesystem API that supports checking on the 
availability of needed calls, HBase will proactively abort the cluster if it 
finds it can't operate safely.
+
+For RegionServer roles, the failure will show up in logs like this:
+
+----
+2018-04-05 11:36:22,785 ERROR [regionserver/192.168.1.123:16020] 
wal.AsyncFSWALProvider: The RegionServer async write ahead log provider relies 
on the ability to call hflush and hsync for proper operation during component 
failures, but the current FileSystem does not support doing so. Please check 
the config value of 'hbase.wal.dir' and ensure it points to a FileSystem mount 
that has suitable capabilities for output streams.
+2018-04-05 11:36:22,799 ERROR [regionserver/192.168.1.123:16020] 
regionserver.HRegionServer: ***** ABORTING region server 
192.168.1.123,16020,1522946074234: Unhandled: cannot get log writer *****
+java.io.IOException: cannot get log writer
+        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:112)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:612)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:124)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:489)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:251)
+        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:69)
+        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:44)
+        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
+        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
+        at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:252)
+        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2105)
+        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1326)
+        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1191)
+        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1007)
+        at java.lang.Thread.run(Thread.java:745)
+Caused by: 
org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
hflush and hsync
+        at 
org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:69)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:168)
+        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
+        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:99)
+        ... 15 more
+
+----
+
+If you are attempting to run in standalone mode and see this error, please 
walk back through the section <<quickstart>> and ensure you have included *all* 
the given configuration settings.
+
+
 [[trouble.rs.runtime]]
 === Runtime Errors
 
@@ -1127,6 +1166,29 @@ Sure fire solution is to just use Hadoop dfs to delete 
the HBase root and let HB
 
 If you have many regions on your cluster and you see an error like that 
reported above in this sections title in your logs, see 
link:https://issues.apache.org/jira/browse/HBASE-4246[HBASE-4246 Cluster with 
too many regions cannot withstand some master failover scenarios].
 
+[[trouble.master.startup.hsync]]
+==== Master fails to become active due to lack of hsync for filesystem
+
+HBase's internal framework for cluster operations requires the ability to 
durably save state in a write ahead log. When using a version of Apache Hadoop 
Common's filesystem API that supports checking on the availability of needed 
calls, HBase will proactively abort the cluster if it finds it can't operate 
safely.
+
+For Master roles, the failure will show up in logs like this:
+
+----
+2018-04-05 11:18:44,653 ERROR [Thread-21] master.HMaster: Failed to become 
active master
+java.lang.IllegalStateException: The procedure WAL relies on the ability to 
hsync for proper operation during component failures, but the underlying 
filesystem does not support doing so. Please check the config value of 
'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness 
and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount 
that can provide it.
+        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1034)
+        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:374)
+        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:530)
+        at 
org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1267)
+        at 
org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1173)
+        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:881)
+        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2048)
+        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:568)
+        at java.lang.Thread.run(Thread.java:745)
+----
+
+If you are attempting to run in standalone mode and see this error, please 
walk back through the section <<quickstart>> and ensure you have included *all* 
the given configuration settings.
+
 [[trouble.master.shutdown]]
 === Shutdown Errors
 

Reply via email to