[
https://issues.apache.org/jira/browse/HBASE-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-5079.
--------------------------
Resolution: Not a Problem
Resolving as not a problem, something we have not seen for years (I believe).
Thanks [~clehene]
> DistributedLogSplitter interrupt can be hazardous to regionserver health
> ------------------------------------------------------------------------
>
> Key: HBASE-5079
> URL: https://issues.apache.org/jira/browse/HBASE-5079
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Labels: delete
>
> The DLS interrupt can kill the regionserver if happens while conversation w/
> namenode is going on.
> The interrupt is used to end a task on regionserver when done whether
> successful or to interrupt an ongoing split since assumed by another server.
> I saw this issue testing because I was killing servers. I also was suffering
> "HBASE-5078 DistributedLogSplitter failing to split file because it has edits
> for lots of regions" which made it more likely to happen.
> Here is what it looks like on the regionserver that died:
> {code}
> 2011-12-20 17:54:58,009 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
> preempted from sv4r13s38,7003,1324365396583, current task state and
> owner=owned sv4r27s44,7003,1324365396664
> 2011-12-20 17:54:58,009 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to
> stop the worker thread
> 2011-12-20 17:54:59,133 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
> preempted from sv4r13s38,7003,1324365396583, current task state and
> owner=owned sv4r27s44,7003,1324365396664
> 2011-12-20 17:54:59,134 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to
> stop the worker thread
> ...
> 2011-12-20 17:55:25,505 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
> preempted from sv4r13s38,7003,1324365396583, current task state and
> owner=unassigned sv4r11s38,7001,1324365395047
> 2011-12-20 17:55:25,505 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to
> stop the worker thread
> {code}
> Three interrupts are sent over period of 31 seconds or so.
> Eventually the interrupt has an effect and I get:
> {code}
> 2011-12-20 17:55:25,505 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to
> stop the worker thread
> 2011-12-20 17:55:48,022 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller:
> HLog roll requested
> 2011-12-20 17:55:58,070 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Call to sv4r11s38/10.4.11.38:7000 failed on
> local exception: java.nio.channels.ClosedByInterruptException
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
> at org.apache.hadoop.ipc.Client.call(Client.java:1071)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy9.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:779)
> at org.apache.hadoop.ipc.Client.call(Client.java:1047)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy9.getFileInfo(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:875)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.convertRegionEditsToTemp(HLogSplitter.java:1097)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.createWAP(HLogSplitter.java:1066)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.convertRegionEditsToTemp(HLogSplitter.java:1097)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.createWAP(HLogSplitter.java:1066)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:410)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
> at java.lang.Thread.run(Thread.java:662)
> 2011-12-20 17:55:58,070 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Could not prepare temp
> staging area
> java.io.IOException: Call to sv4r11s38/10.4.11.38:7000 failed on local
> exception: java.nio.channels.ClosedByInterruptException
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
> at org.apache.hadoop.ipc.Client.call(Client.java:1071)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy9.getFileInfo(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:875)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.convertRegionEditsToTemp(HLogSplitter.java:1097)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.createWAP(HLogSplitter.java:1066)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:410)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:779)
> at org.apache.hadoop.ipc.Client.call(Client.java:1047)
> ... 20 more
> {code}
> Now here is the wacky part. Above we are trying to go to the namenode it
> looks like and the interrupt is closing socket. At same, time, I'm trying to
> flush and it fails with same stack trace ... and because we failed a flush,
> regionserver goes down:
> {code}
> 2011-12-20 17:55:58,071 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery
> for block blk_-6239131583587622790_187561 bad datanode[0] nodes == null
> 2011-12-20 17:55:58,073 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
> block locations. Source file
> "/hbase/TestTable/9f98e6764d322832c845b740336e5750/.tmp/0df1673c96274028bda24c9cb49e9c3e"
> - Aborting...
> 2011-12-20 17:55:58,074 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> sv4r13s38,7003,1324365396583: Replay of HLog required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> TestTable,0134394898,1323822783216.9f98e6764d322832c845b740336e5750.
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1276)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1160)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1102)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400)
>
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Call to sv4r11s38/10.4.11.38:7000 failed on
> local exception: java.nio.channels.ClosedByInterruptException
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
> at org.apache.hadoop.ipc.Client.call(Client.java:1071)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy9.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:779)
> at org.apache.hadoop.ipc.Client.call(Client.java:1047)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy9.getFileInfo(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.getFileInfo(Unknown Source)
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)