[
https://issues.apache.org/jira/browse/HBASE-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517857#comment-15517857
]
Carol Pearson commented on HBASE-12074:
---------------------------------------
I've recently encountered this bug as well when using Trafodion with a large
table load (5.5 billion rows) and HBase 1.0.0-cdh5.4.5.
2016-09-22 05:20:03,211 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for TRAFODION.JAVABENCH.OE_ORDERLINE_18\
432,\x00\x00\x00*\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1474490067039.f1ea6356c0c99c7d7ad1a531e003d9cd.,
curren\
t region memstore size 516.50 MB, and 1/2 column families' memstores are being
flushed.
2016-09-22 05:20:03,211 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Flushing Column Family: 0000001 which was occupying 516.73 MB of me\
mstore.
2016-09-22 05:20:04,494 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Slow sync cost: 181 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK],
DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:06,318 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Slow sync cost: 153 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK],
DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:07,154 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Slow sync cost: 149 ms, current pipeline: [DatanodeInfoWith\
Storage[172.31.58.52:50010,DS-2faa071d-835d-405c-9246-6c43dd71ddb4,DISK],
DatanodeInfoWithStorage[172.31.54.58:50010,DS-f7da0d2e-4b9c-4da\
d-99ed-0009345f9410,DISK]]
2016-09-22 05:20:07,860 ERROR org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Error syncing, request close of wal
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
... 2 more
2016-09-22 05:20:07,869 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog:
Rolled WAL /hbase/WALs/isleroyale03.cluster.local,60020,147\
4421718304/isleroyale03.cluster.local%2C60020%2C1474421718304.null0.1474521602486
with entries=2020, filesize=123.71 MB; new WAL /hbase/W\
ALs/isleroyale03.cluster.local,60020,1474421718304/isleroyale03.cluster.local%2C60020%2C1474421718304.null0.1474521607715
2016-09-22 05:20:07,870 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
isleroyale03.cluster.local,60020\
,1474421718304: IOE in log roller
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
... 2 more
> TestLogRollingNoCluster#testContendedLogRolling() failed
> --------------------------------------------------------
>
> Key: HBASE-12074
> URL: https://issues.apache.org/jira/browse/HBASE-12074
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Stephen Yuan Jiang
>
> TestLogRollingNoCluster#testContendedLogRolling() failed on a 0.98 run. I am
> trying to understand the context.
> The failure is this:
> {code}
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertFalse(Assert.java:64)
> at org.junit.Assert.assertFalse(Assert.java:74)
> at
> org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:80)
> {code}
> Caused because one of the Appenders calling FSHLog.sync() threw IOE because
> of concurrent close:
> {code}
> 4-09-23 16:36:39,530 FATAL [pool-1-thread-1-WAL.AsyncSyncer0]
> wal.FSHLog$AsyncSyncer(1246): Error while AsyncSyncer sync, request close of
> hlog
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> ... 2 more
> 2014-09-23 16:36:39,531 INFO [32] wal.TestLogRollingNoCluster$Appender(137):
> Caught exception from Appender:32
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> ... 2 more
> 2014-09-23 16:36:39,532 INFO [19] wal.TestLogRollingNoCluster$Appender(137):
> Caught exception from Appender:19
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:168)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1241)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:165)
> ... 2 more
> {code}
> The code is:
> {code}
> public void sync() throws IOException {
> try {
> this.output.flush();
> this.output.sync();
> } catch (NullPointerException npe) {
> // Concurrent close...
> throw new IOException(npe);
> }
> }
> {code}
> I think the test case written exactly to catch this case:
> {code}
> * Spin up a bunch of threads and have them all append to a WAL. Roll the
> * WAL frequently to try and trigger NPE.
> {code}
> This is why I am reporting since I don't have much context. It may not be a
> test issue, but an actual bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)