Re: Hmaster cannot start running again

ramkrishna vasudevan Wed, 02 Jan 2013 20:23:04 -0800

Hi Dalia

The Namenode SafeMode Exception should not be related with number of region
servers.
Any turn off safemode and try restarting your cluster.  May be internally
namenode is not able to locate some blocks.
That needs more investigation.


Regards
Ram

On Thu, Jan 3, 2013 at 8:13 AM, varun kumar <[email protected]> wrote:

> Hi Daila,
>
> Safemode is on.
>
> Turn Off safemode you will be write files into that cluster.
>
> Hadoop cluster will turn off safemode automatically when the gets it's
> required blocks.
>
> In your scenario try to start 2 more region server.
>
> Regards,
> Varun Kumar.P
>
>
> On Wed, Jan 2, 2013 at 11:30 PM, Dalia Sobhy <[email protected]
> >wrote:
>
> >
> > Dear all,
> >
> > I started first 2 region servers and added 6 million records to them.
> Then
> > added about 3 region servers and everything was fine. I ran a java
> program
> > on then and they working properly. But after stopping four region
> servers,
> > the HMASter is not working. I added another region server, but also it
> > doesn't work I dunno why.
> >
> > From namenode log file:
> > Safe mode is ON. The reported blocks 43 needs additional 8 blocks
> >  to reach the threshold 0.9990 of total blocks 51. Safe mode will be
> > turned off automatically.
> >
> > From Hmaster log file:
> > resubmitting task /hbase/splitlog/hdfs%3A%2F%2Fslave7.medcloud.com
> > %3A8020%2Fhbase%2F.logs%2Fslave1.medcloud.com
> > %2C60020%2C1357145569383-splitting%2Fslave1.medcloud.com
> > %252C60020%252C1357145569383.1357145572049
> >
> > org.apache.hadoop.hbase.master.SplitLogManager
> >                             task /hbase/splitlog/RESCAN0000000832 entered
> > state done slave7.medcloud.com,60000,1357148608279
> >
> > org.apache.hadoop.hbase.util.FSUtils                            Waiting
> > for dfs to exit safe mode...
> >
> > the last line repeated alot.
> >
> > From region server log file:
> >
> > org.apache.zookeeper.ZooKeeper
> >                             Client
> >
> environment:java.library.path=/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:java.io.tmpdir=/tmp
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:java.compiler=<NA>
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:os.name=Linux
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:os.arch=amd64
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client
> environment:os.version=3.2.0-29-generic
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:user.name=hbase
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client environment:user.home=/var/lib/hbase
> >
> >
> >
> >
> >                             7:38:58.778 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Client
> >
> environment:user.dir=/run/cloudera-scm-agent/process/747-hbase-REGIONSERVER
> >
> >
> >
> >
> >                             7:38:58.792 PM
> >                             INFO
> >                             org.apache.zookeeper.ZooKeeper
> >                             Initiating client connection, connectString=
> > slave4.medcloud.com:2181 sessionTimeout=60000 watcher=regionserver:60020
> >
> >
> >
> >
> >                             7:38:58.924 PM
> >                             INFO
> >
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> >                             The identifier of this process is
> > [email protected]
> >
> >
> >
> >
> >                             7:38:58.995 PM
> >                             INFO
> >                             org.apache.zookeeper.ClientCnxn
> >                             Opening socket connection to server
> > slave4.medcloud.com/192.168.0.5:2181. Will not attempt to authenticate
> > using SASL (Unable to locate a login configuration)
> >
> >
> >
> >
> >                             7:38:59.022 PM
> >                             INFO
> >                             org.apache.zookeeper.ClientCnxn
> >                             Socket connection established to
> > slave4.medcloud.com/192.168.0.5:2181, initiating session
> >
> >
> >
> >
> >                             7:38:59.047 PM
> >                             INFO
> >                             org.apache.zookeeper.ClientCnxn
> >                             Session establishment complete on server
> > slave4.medcloud.com/192.168.0.5:2181, sessionid = 0x13bfc435d0c000b,
> > negotiated timeout = 40000
> >
> >
> >
> >
> >                             7:39:00.485 PM
> >                             INFO
> >
> > org.apache.hadoop.hbase.regionserver.ShutdownHook
> >                             Installed shutdown hook thread:
> > Shutdownhook:regionserver60020
> >
> > From zookeeper log file:
> >
> > org.apache.zookeeper.server.PrepRequestProcessor
> >                             Got user-level KeeperException when
> processing
> > sessionid:0x13bfc435d0c000d type:delete cxid:0xa zxid:0x14a6 txntype:-1
> > reqpath:n/a Error Path:/hbase/backup-masters/slave7.medcloud.com
> ,60000,1357148608279
> > Error:KeeperErrorCode = NoNode
> > for /hbase/backup-masters/slave7.medcloud.com,60000,1357148608279
> >
> >
> > I tried Deploying client configurations, then restart the cluster same
> > error. I tried restarting the machines same error as well.
> >
> > So any help please.
> >
> > Sometimes I get this error in log file:
> > Failed to start master java.lang.RuntimeException: HMaster Aborted
> >
> >
> >
> >
>
>
>
>
> --
> Regards,
> Varun Kumar.P
>

Re: Hmaster cannot start running again

Reply via email to