[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5623:
---------------------------------

    Attachment: 5623-suggestion.txt

This one works for me.
The observation is that only with the updateLock held are we guaranteed that 
this.writer is not null (as Enis pointed out correctly).

So we can get a writer with the lock held, it is known to be not-null. If that 
writer has been closed we get an IOException, in which case we try again with 
the instance writer and the lock held.

Not entirely pretty, but avoids the AtomicReference everywhere.
                
> Race condition when rolling the HLog and hlogFlush
> --------------------------------------------------
>
>                 Key: HBASE-5623
>                 URL: https://issues.apache.org/jira/browse/HBASE-5623
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 0.94.0
>
>         Attachments: 5623-suggestion.txt, 5623.txt, 5623v2.txt, 
> HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch
>
>
> When doing a ycsb test with a large number of handlers 
> (regionserver.handler.count=60), I get the following exceptions:
> {code}
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
>       at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
>       at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
>       at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
>       at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
>       at $Proxy1.multi(Unknown Source)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
>       at 
> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
> {code}
> and 
> {code}
>       java.lang.NullPointerException
>               at 
> org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
>               at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
>               at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
>               at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
>               at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
>               at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
>               at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
>               at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
>               at 
> org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
>               at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
>               at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
>               at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>               at java.lang.reflect.Method.invoke(Method.java:597)
>               at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
>               at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
> {code}
> It seems the root cause of the issue is that we open a new log writer and 
> close the old one at HLog#rollWriter() holding the updateLock, but the other 
> threads doing syncer() calls
> {code} 
> logSyncerThread.hlogFlush(this.writer);
> {code}
> without holding the updateLock. LogSyncer only synchronizes against 
> concurrent appends and flush(), but not on the passed writer, which can be 
> closed already by rollWriter(). In this case, since 
> SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to