Race condition when rolling the HLog and hlogFlush
--------------------------------------------------

                 Key: HBASE-5623
                 URL: https://issues.apache.org/jira/browse/HBASE-5623
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 0.94.0
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar


When doing a ycsb test with a large number of handlers 
(regionserver.handler.count=60), I get the following exceptions:

{code}
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.NullPointerException
        at 
org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
        at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
        at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
        at $Proxy1.multi(Unknown Source)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
        at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
{code}

and 
{code}
        java.lang.NullPointerException
                at 
org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
                at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
                at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
                at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
                at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
                at 
org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
                at 
org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
                at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
                at 
org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
                at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
                at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
                at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
{code}

It seems the root cause of the issue is that we open a new log writer and close 
the old one at HLog#rollWriter() holding the updateLock, but the other threads 
doing syncer() calls
{code} 
logSyncerThread.hlogFlush(this.writer);
{code}
without holding the updateLock. LogSyncer only synchronizes against concurrent 
appends and flush(), but not on the passed writer, which can be closed already 
by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out 
field as null, we get the NPE. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to