[ 
https://issues.apache.org/jira/browse/HBASE-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571053#comment-15571053
 ] 

Atri Sharma commented on HBASE-12074:
-------------------------------------

Could a possible fix be to make rollWriter get the zig-zag latch and call 
doReplaceWriter as the first operation, before attempting to close and flush 
the log files? This will lead new HLog Writer threads to see the newPath 
already set and not wait for the flush to happen, and the old file cleanup can 
happen as a background thread.

> TestLogRollingNoCluster#testContendedLogRolling() failed
> --------------------------------------------------------
>
>                 Key: HBASE-12074
>                 URL: https://issues.apache.org/jira/browse/HBASE-12074
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Stephen Yuan Jiang
>
> TestLogRollingNoCluster#testContendedLogRolling() failed on a 0.98 run. I am 
> trying to understand the context. 
> The failure is this: 
> {code}
> java.lang.AssertionError
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at org.junit.Assert.assertFalse(Assert.java:64)
>       at org.junit.Assert.assertFalse(Assert.java:74)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:80)
> {code}
> Caused because one of the Appenders calling FSHLog.sync() threw IOE because 
> of concurrent close: 
> {code}
> 4-09-23 16:36:39,530 FATAL [pool-1-thread-1-WAL.AsyncSyncer0] 
> wal.FSHLog$AsyncSyncer(1246): Error while AsyncSyncer sync, request close of 
> hlog 
> java.io.IOException: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
>       ... 2 more
> 2014-09-23 16:36:39,531 INFO  [32] wal.TestLogRollingNoCluster$Appender(137): 
> Caught exception from Appender:32
> java.io.IOException: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
>       ... 2 more
> 2014-09-23 16:36:39,532 INFO  [19] wal.TestLogRollingNoCluster$Appender(137): 
> Caught exception from Appender:19
> java.io.IOException: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
>       ... 2 more
> {code}
> The code is: 
> {code}
>   public void sync() throws IOException {
>     try {
>       this.output.flush();
>       this.output.sync();
>     } catch (NullPointerException npe) {
>       // Concurrent close...
>       throw new IOException(npe);
>     }
>   }
> {code}
> I think the test case written exactly to catch this case: 
> {code}
>    * Spin up a bunch of threads and have them all append to a WAL.  Roll the
>    * WAL frequently to try and trigger NPE.
> {code}
> This is why I am reporting since I don't have much context. It may not be a 
> test issue, but an actual bug. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to