[ 
https://issues.apache.org/jira/browse/HBASE-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9563:
-------------------------

    Attachment: 9563.txt

Make it so we do not return non-zero if file w/ master znode is not specified 
or found.

Was going to try this first but likely needs more.  What we see is that master 
is killed but it is sticking around.  30seconds later, we go to start a master 
but if fails because JMX port is occupied.  This new master on its way out 
tries to clear the master znode.   Its clearing znode seems to make the stuck 
master fail so we have two znode cleaners running about same time.  Here is log:

{code}
2013-10-07 05:06:29,713 INFO  [AM.ZK.Worker-pool2-t559] master.RegionStates: 
Offlined d2ff3ce62dacd333d98feee91f620f8a from 
a1806.halxg.cloudera.com,60020,1381147238190
  3 2013-10-07 05:06:57,313 INFO  [main] util.VersionInfo: HBase 0.96.0
  4 2013-10-07 05:06:57,313 INFO  [main] util.VersionInfo: Subversion 
git://hbase-jenkins.ent.cloudera.com/var/lib/jenkins/jobs/hbase-096/workspace 
-r 06a2800d3faf83aec482c210c61d453ce8e759bc
  5 2013-10-07 05:06:57,313 INFO  [main] util.VersionInfo: Compiled by jenkins 
on Mon Oct  7 00:11:57 PDT 2013
  6 Mon Oct  7 05:06:57 PDT 2013 Starting master on a1805.halxg.cloudera.com
  7 core file size          (blocks, -c) 0
  8 data seg size           (kbytes, -d) unlimited
  9 scheduling priority             (-e) 0
 10 file size               (blocks, -f) unlimited
 11 pending signals                 (-i) 386225
 12 max locked memory       (kbytes, -l) 64
 13 max memory size         (kbytes, -m) unlimited
 14 open files                      (-n) 32768
 15 pipe size            (512 bytes, -p) 8
 16 POSIX message queues     (bytes, -q) 819200
 17 real-time priority              (-r) 0
 18 stack size              (kbytes, -s) 8192
 19 cpu time               (seconds, -t) unlimited
 20 max user processes              (-u) 32768
 21 virtual memory          (kbytes, -v) unlimited
 22 file locks                      (-x) unlimited
 23 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
 24 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:host.name=a1805.halxg.cloudera.com
 25 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.version=1.7.0_25
 26 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.vendor=Oracle Corporation
 27 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.home=/opt/toolchain/sun-jdk-64bit-1.7.0.25/jre
 28 2013-10-07 05:06:57,616 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.class.path=/opt/hbase/current/bin/../conf:/opt/toolchain/sun-jdk-64bit-1.7.0.25/lib/tools.jar:/opt/hbase/current/bin/#
 29 2013-10-07 05:06:57,617 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.library.path=/opt/hadoop/hadoop-2.1.0-beta/lib/native
 30 2013-10-07 05:06:57,617 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.io.tmpdir=/tmp
 31 2013-10-07 05:06:57,617 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.compiler=<NA>
 32 2013-10-07 05:06:57,617 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.name=Linux
 33 2013-10-07 05:06:57,617 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.arch=amd64
 34 2013-10-07 05:06:57,618 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.version=3.2.0-43-generic
 35 2013-10-07 05:06:57,618 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.name=hbase
 36 2013-10-07 05:06:57,618 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.home=/home/hbase
 37 2013-10-07 05:06:57,618 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.dir=/home/hbase
 38 2013-10-07 05:06:57,619 INFO  [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 
watcher=clean znode for master
 39 2013-10-07 05:06:57,651 INFO  [main] zookeeper.RecoverableZooKeeper: 
Process identifier=clean znode for master connecting to ZooKeeper 
ensemble=a1805.halxg.cloudera.com:2181
 40 2013-10-07 05:06:57,655 INFO  
[main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Opening 
socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will 
not attempt #
 41 2013-10-07 05:06:57,661 INFO  
[main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket 
connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, 
initiating session
 42 2013-10-07 05:06:57,685 INFO  
[main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Session 
establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, 
sessionid = #
 43 2013-10-07 05:06:59,677 INFO  [main] util.VersionInfo: HBase 0.96.0
1 2013-10-07 05:06:59,677 INFO  [main] util.VersionInfo: Subversion 
git://hbase-jenkins.ent.cloudera.com/var/lib/jenkins/jobs/hbase-096/workspace 
-r 06a2800d3faf83aec482c210c61d453ce8e759bc
  2 2013-10-07 05:06:59,678 INFO  [main] util.VersionInfo: Compiled by jenkins 
on Mon Oct  7 00:11:57 PDT 2013
  3 2013-10-07 05:06:59,971 INFO  [main] zookeeper.ZooKeeper: Client 
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
  4 2013-10-07 05:06:59,971 INFO  [main] zookeeper.ZooKeeper: Client 
environment:host.name=a1805.halxg.cloudera.com
  5 2013-10-07 05:06:59,971 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.version=1.7.0_25
  6 2013-10-07 05:07:00,008 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.vendor=Oracle Corporation
  7 2013-10-07 05:07:00,008 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.home=/opt/toolchain/sun-jdk-64bit-1.7.0.25/jre
  8 2013-10-07 05:07:00,008 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.class.path=/opt/hbase/current/bin/../conf:/opt/toolchain/sun-jdk-64bit-1.7.0.25/lib/tools.jar:/opt/hbase/current/bin/#
  9 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.library.path=/opt/hadoop/hadoop-2.1.0-beta/lib/native
 10 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.io.tmpdir=/tmp
 11 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:java.compiler=<NA>
 12 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.name=Linux
 13 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.arch=amd64
 14 2013-10-07 05:07:00,009 INFO  [main] zookeeper.ZooKeeper: Client 
environment:os.version=3.2.0-43-generic
 15 2013-10-07 05:07:00,010 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.name=hbase
 16 2013-10-07 05:07:00,010 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.home=/home/hbase
 17 2013-10-07 05:07:00,010 INFO  [main] zookeeper.ZooKeeper: Client 
environment:user.dir=/home/hbase
 18 2013-10-07 05:07:00,011 INFO  [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 
watcher=clean znode for master
 19 2013-10-07 05:07:00,042 INFO  [main] zookeeper.RecoverableZooKeeper: 
Process identifier=clean znode for master connecting to ZooKeeper 
ensemble=a1805.halxg.cloudera.com:2181
 20 2013-10-07 05:07:00,043 WARN  [main] hbase.ZNodeClearer: Can't read the 
content of the znode file
 21 java.io.FileNotFoundException: /tmp/hbase-hbase-master.znode (No such file 
or directory)
 22 ,...at java.io.FileInputStream.open(Native Method)
 23 ,...at java.io.FileInputStream.<init>(FileInputStream.java:138)
 24 ,...at java.io.FileInputStream.<init>(FileInputStream.java:97)
 25 ,...at java.io.FileReader.<init>(FileReader.java:58)
 26 ,...at 
org.apache.hadoop.hbase.ZNodeClearer.readMyEphemeralNodeOnDisk(ZNodeClearer.java:95)
 27 ,...at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:143)
 28 ,...at 
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
 29 ,...at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 30 ,...at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
 31 ,...at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2787)
 32 2013-10-07 05:07:00,046 INFO  
[main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Opening 
socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will 
not attempt #
{code}

Elliott suggests setting master in autorestart mode or beyond that, having 
master restart retry (doesn't seem to be easy facility for this in the 
ClusterManager interface at the mo).

> Autorestart doesn't work if zkcleaner fails
> -------------------------------------------
>
>                 Key: HBASE-9563
>                 URL: https://issues.apache.org/jira/browse/HBASE-9563
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: stack
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9563.txt
>
>
> I've seen this several times where a master didn't autorestart because zk 
> cleaner failed.  We should still restart the daemon even if it's not possible 
> to clean the zk nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to