Repository: incubator-predictionio Updated Branches: refs/heads/develop 23a869328 -> dfb01e327
Add solution for HBase failure after disk full Due to some issues of ZooKeeper, it takes some effort to have HBase recovered from failure caused by full disk. Project: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/commit/6975fc06 Tree: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/tree/6975fc06 Diff: http://git-wip-us.apache.org/repos/asf/incubator-predictionio/diff/6975fc06 Branch: refs/heads/develop Commit: 6975fc06bad76ad275d10a17af80387c80e60fbd Parents: 3525049 Author: Amy Lin <[email protected]> Authored: Mon Mar 13 09:40:33 2017 -0700 Committer: Donald Szeto <[email protected]> Committed: Mon Mar 13 09:40:33 2017 -0700 ---------------------------------------------------------------------- docs/manual/source/resources/faq.html.md | 32 +++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-predictionio/blob/6975fc06/docs/manual/source/resources/faq.html.md ---------------------------------------------------------------------- diff --git a/docs/manual/source/resources/faq.html.md b/docs/manual/source/resources/faq.html.md index f80167b..455d06c 100644 --- a/docs/manual/source/resources/faq.html.md +++ b/docs/manual/source/resources/faq.html.md @@ -216,3 +216,35 @@ there could be a chance that reverse DNS does not function properly. You can install a DNS server on your own computer. Some users have reported that using [Google Public DNS](https://developers.google.com/speed/public-dns/) would also solve the problem. + +### Q: How to fix Hbase issues after disk recovered from full state? + +You may receive error messages like `write error: No space left on device` +when disk is full, and also receive error from `pio status` even after +restarting pio services (due to +[an issue](https://issues.apache.org/jira/browse/ZOOKEEPER-1621) in ZooKeeper). + +The workaround is to delete newest `snapshot.xxxxx` and `log.xxxoo` under +zookeeper data directory (ex: `$(HbaseRoot)/zookeeper/zookeeper_0/version-2`). Then +restart all service with `pio-start-all`, and `pio status` will give you good answer. + +But If you still have problems connecting to event server, go checkout Hbase +dashboard to see if there are `regions under transition`, then follow the steps: + +1. Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the +problem then you are all set, otherwise continue on. +2. Find out the failing regions by `hbase hbck`. + + ``` + ... + Summary: + Table pio_event:events_1 is inconsistent. + Number of regions: 2 + Deployed on: prediction.io,54829,1489213832255 + ... + 2 inconsistencies detected. + ``` +3. Shutdown Hbase process and delete `recovered.edits` folders under hbase data +directory (ex: `$(HbaseRoot)/hbase/data/pio_event/events_1` in this example) +for failing regions. +4. Run `hbase hbck -repairHoles` and restart all pio services.
