Repository: incubator-predictionio
Updated Branches:
  refs/heads/livedoc 352504965 -> ef0e26941


Add solution for HBase failure after disk full

Due to some issues of ZooKeeper, it takes some effort to have HBase recovered 
from failure caused by full disk.


Project: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-predictionio/commit/6975fc06
Tree: 
http://git-wip-us.apache.org/repos/asf/incubator-predictionio/tree/6975fc06
Diff: 
http://git-wip-us.apache.org/repos/asf/incubator-predictionio/diff/6975fc06

Branch: refs/heads/livedoc
Commit: 6975fc06bad76ad275d10a17af80387c80e60fbd
Parents: 3525049
Author: Amy Lin <[email protected]>
Authored: Mon Mar 13 09:40:33 2017 -0700
Committer: Donald Szeto <[email protected]>
Committed: Mon Mar 13 09:40:33 2017 -0700

----------------------------------------------------------------------
 docs/manual/source/resources/faq.html.md | 32 +++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/6975fc06/docs/manual/source/resources/faq.html.md
----------------------------------------------------------------------
diff --git a/docs/manual/source/resources/faq.html.md 
b/docs/manual/source/resources/faq.html.md
index f80167b..455d06c 100644
--- a/docs/manual/source/resources/faq.html.md
+++ b/docs/manual/source/resources/faq.html.md
@@ -216,3 +216,35 @@ there could be a chance that reverse DNS does not function 
properly. You can
 install a DNS server on your own computer. Some users have reported that using
 [Google Public DNS](https://developers.google.com/speed/public-dns/) would also
 solve the problem.
+
+### Q: How to fix Hbase issues after disk recovered from full state?
+
+You may receive error messages like `write error: No space left on device` 
+when disk is full, and also receive error from `pio status` even after 
+restarting pio services (due to 
+[an issue](https://issues.apache.org/jira/browse/ZOOKEEPER-1621) in ZooKeeper).
+
+The workaround is to delete newest `snapshot.xxxxx` and `log.xxxoo` under 
+zookeeper data directory (ex: `$(HbaseRoot)/zookeeper/zookeeper_0/version-2`). 
Then 
+restart all service with `pio-start-all`, and `pio status` will give you good 
answer.
+
+But If you still have problems connecting to event server, go checkout Hbase 
+dashboard to see if there are `regions under transition`, then follow the 
steps: 
+
+1. Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the 
+problem then you are all set, otherwise continue on.
+2. Find out the failing regions by `hbase hbck`.
+
+       ```
+         ...
+       Summary:
+       Table pio_event:events_1 is inconsistent.
+           Number of regions: 2
+           Deployed on:  prediction.io,54829,1489213832255
+         ...
+         2 inconsistencies detected.
+       ```
+3. Shutdown Hbase process and delete `recovered.edits` folders under hbase 
data 
+directory (ex: `$(HbaseRoot)/hbase/data/pio_event/events_1` in this example) 
+for failing regions.
+4. Run `hbase hbck -repairHoles` and restart all pio services.

Reply via email to