[ 
https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576543#comment-15576543
 ] 

Enis Soztutar commented on HBASE-16824:
---------------------------------------

I've been inspecting this issue which results in frequent exceptions in the log 
with something like: 
{code}
2016-10-14 14:20:55,253 ERROR [sync.2] wal.FSHLog$SyncRunner(636): Error 
syncing, request close of WAL
java.nio.channels.ClosedChannelException
        at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1521)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1942)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1887)
        at 
org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
        at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:79)
        at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster$HighLatencySyncWriter.sync(TestLogRollingNoCluster.java:67)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:632)
        at java.lang.Thread.run(Thread.java:745)
2016-10-14 14:20:55,253 INFO  [95] wal.AbstractFSWAL(627): Rolled WAL 
/user/enis/test-data/30328473-7c72-4729-b1a5-5bc085516dd5/WALs/org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster/org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.1476480055179
 with entries=6862, filesize=0 B; new WAL 
/user/enis/test-data/30328473-7c72-4729-b1a5-5bc085516dd5/WALs/org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster/org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.1476480055220
2016-10-14 14:20:55,254 INFO  [42] wal.TestLogRollingNoCluster$Appender(177): 
Caught exception from Appender:42
java.nio.channels.ClosedChannelException
        at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1521)
{code} 

which is also reported in 
https://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3c74ecffa8dc3b6847888649793c770fe0a2d67...@blreml510-mbs.china.huawei.com%3E.
 

Turns out the problem is that when we want to replace the WAL writer, we wait 
for attaining a safe point between the LogRoller and the 
RingBufferEventHandler. However, there is no coordination between the log 
roller and the SyncRunner threads which can still call writer.sync(). This 
results in the above exception on HDFS, and some already sync'ed requests to 
raise exceptions back to the client (maybe a minor correctness issue for 
non-idempotent operations). 

I've modified TestLogRollingNoCluster by introducing an artificial delay, and I 
can reproduce this every time. 

> Make replacement of path the first operation during WAL rotation
> ----------------------------------------------------------------
>
>                 Key: HBASE-16824
>                 URL: https://issues.apache.org/jira/browse/HBASE-16824
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Atri Sharma
>
> In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an 
> async thread calls flush on a WAL record already closed as the WAL is being 
> rotated. This JIRA investigates if setting the new WAL record path as the 
> first operation during WAL rotation will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to