[
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618549#comment-13618549
]
Jieshan Bean commented on HBASE-8230:
-------------------------------------
bq.Did the failure happen when region server restarted ?
Yes.
bq.If this was repeatable, I would suggest finding the root cause.
The root cause in our env was NameNode was in safemode:
{noformat}
2013-03-29 10:32:42,260 FATAL [regionserver26003] ABORTING region server
om-host2,26003,1364524173470: Unhandled exception: cannot get log writer
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1737)
java.io.IOException: cannot get log writer
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:757)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:701)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:637)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:582)
at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:436)
at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:362)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1327)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1316)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1030)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:706)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
file/hbase/.logs/om-host2,26003,1364524173470/om-host2%2C26003%2C1364524173470.1364524361366.
Name node is in safe mode.
The reported blocks 14 has reached the threshold 0.9990 of total blocks 14.
Safe mode will be turned off automatically in 21 seconds.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1601)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1547)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:412)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:204)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43664)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1704)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:209)
at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:754)
... 10 more
{noformat}
> Possible NPE on regionserver abort if replication service has not been started
> ------------------------------------------------------------------------------
>
> Key: HBASE-8230
> URL: https://issues.apache.org/jira/browse/HBASE-8230
> Project: HBase
> Issue Type: Bug
> Components: regionserver, Replication
> Affects Versions: 0.94.6
> Reporter: Jieshan Bean
> Assignee: Jieshan Bean
> Attachments: HBASE-8230-94.patch
>
>
> RegionServer got Exception on calling setupWALAndReplication, so entered
> abort flow. Since replicationSink had not been inialized yet, we got below
> exception:
> {noformat}
> Exception in thread "regionserver26003" java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
> at
> org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira