[
https://issues.apache.org/jira/browse/HBASE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658516#action_12658516
]
Thibaut commented on HBASE-1078:
--------------------------------
At least one DFS client node has lots of DataXceiver exceptions. (I had removed
the hadoop config entry for this and I will set dfs.datanode.max.xcievers to
2048 and retry).
08/12/22 12:33:07 INFO datanode.DataNode: writeBlock
blk_2596678729303248869_20816 received exception java.io.EOFException: while
trying to read 65557 bytes
08/12/22 12:33:07 ERROR datanode.DataNode:
DatanodeRegistration(192.168.0.7:50010,
storageID=DS-1147686120-78.46.82.211-50010-1229624756268, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:298)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
at java.lang.Thread.run(Thread.java:619)
08/12/22 12:33:07 INFO datanode.DataNode: PacketResponder
blk_-3816193612223297244_20739 1 Exception java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:836)
at java.lang.Thread.run(Thread.java:619)
08/12/22 12:33:07 INFO datanode.DataNode: PacketResponder
blk_-3816193612223297244_20739 1 : Thread is interrupted.
08/12/22 12:33:07 INFO datanode.DataNode: PacketResponder 1 for block
blk_-3816193612223297244_20739 terminating
08/12/22 12:33:07 INFO datanode.DataNode: writeBlock
blk_-3816193612223297244_20739 received exception java.io.EOFException: while
trying to read 65557 bytes
08/12/22 12:33:07 ERROR datanode.DataNode:
DatanodeRegistration(192.168.0.7:50010,
storageID=DS-1147686120-78.46.82.211-50010-1229624756268, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:298)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
at java.lang.Thread.run(Thread.java:619)
08/12/22 12:33:10 INFO datanode.DataNode: PacketResponder
blk_5759147435898729286_20763 1 Exception java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:836)
at java.lang.Thread.run(Thread.java:619)
08/12/22 12:33:10 INFO datanode.DataNode: PacketResponder
blk_5759147435898729286_20763 1 : Thread is interrupted.
08/12/22 12:33:10 INFO datanode.DataNode: PacketResponder 1 for block
blk_5759147435898729286_20763 terminating
08/12/22 12:33:10 INFO datanode.DataNode: writeBlock
blk_5759147435898729286_20763 received exception java.io.EOFException: while
trying to read 65557 bytes
08/12/22 12:33:10 ERROR datanode.DataNode:
DatanodeRegistration(192.168.0.7:50010,
storageID=DS-1147686120-78.46.82.211-50010-1229624756268, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:254)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:298)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
at java.lang.Thread.run(Thread.java:619)
This might be causing the hbase failure then.
When I try to shut down the cluster, two region servers hang.
Here is the stack trace of one of them, (the other one is identical)
2008-12-22 12:38:16
Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b23 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00002aab354cac00 nid=0x7af7 waiting on
condition [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-1-thread-1" prio=10 tid=0x00002aab35a2b400 nid=0x22b0 waiting on
condition [0x0000000042a52000..0x0000000042a52aa0]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaab554efc0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:582)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:575)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:946)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
at java.lang.Thread.run(Thread.java:619)
"DestroyJavaVM" prio=10 tid=0x00002aab381c8000 nid=0x2275 waiting on condition
[0x0000000000000000..0x000000004022ad10]
java.lang.Thread.State: RUNNABLE
"Low Memory Detector" daemon prio=10 tid=0x00002aab3483c000 nid=0x227e runnable
[0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00002aab34839800 nid=0x227d waiting on
condition [0x0000000000000000..0x0000000040a31440]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00002aab34836000 nid=0x227c waiting on
condition [0x0000000000000000..0x00000000409303c0]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00002aab34834c00 nid=0x227b runnable
[0x0000000000000000..0x0000000040830730]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00002aab34628400 nid=0x227a in Object.wait()
[0x000000004072f000..0x000000004072fa20]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x00002aaab535e3f8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x00002aab34627000 nid=0x2279 in
Object.wait() [0x000000004062e000..0x000000004062eda0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00002aaab5290178> (a java.lang.ref.Reference$Lock)
"VM Thread" prio=10 tid=0x00002aab34621c00 nid=0x2278 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x2276
runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011f400 nid=0x2277
runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aab3483dc00 nid=0x227f waiting on
condition
JNI global references: 1001
There is nothing suspicious in the log file,
it just ends with:
2008-12-22 12:26:16,117 INFO [regionserver/0:0:0:0:0:0:0:0:60020]
regionserver.HRegionServer(508): telling master that region server is shutting
down at: 192.168.0.7:60020
2008-12-22 12:26:16,122 INFO [regionserver/0:0:0:0:0:0:0:0:60020]
regionserver.HRegionServer(515): stopping server at: 192.168.0.7:60020
2008-12-22 12:26:16,545 INFO [regionserver/0:0:0:0:0:0:0:0:60020]
regionserver.HRegionServer(525): regionserver/0:0:0:0:0:0:0:0:60020 exiting
> "java.io.IOException: Could not obtain block": allthough file is there and
> accessible through the dfs client
> ------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1078
> URL: https://issues.apache.org/jira/browse/HBASE-1078
> Project: Hadoop HBase
> Issue Type: Bug
> Environment: hadoop 0.19.0
> hbase 0.19.0-dev, r728134
> Reporter: Thibaut
> Attachments: errorlogs.zip
>
>
> Hi,
> after doing some more stress testing, my cluster did just stopped working.
> The regionserver reponsible for the ROOT region can't read a block related to
> the root region, but it is definitely there as I can read the file through
> the dfs client.
> All new clients fail to start:
> java.io.IOException: java.io.IOException: Could not obtain block:
> blk_-3504243288385983835_18732
> file=/hbase/-ROOT-/70236052/info/mapfiles/780254459775584115/data
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> at java.io.DataInputStream.readInt(DataInputStream.java:370)
> at
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1895)
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1925)
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1830)
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
> at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:517)
> at
> org.apache.hadoop.hbase.regionserver.HStore.rowAtOrBeforeFromMapFile(HStore.java:1709)
> at
> org.apache.hadoop.hbase.regionserver.HStore.getRowKeyAtOrBefore(HStore.java:1681)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1072)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1466)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:894)
> at sun.reflect.GeneratedConstructorAccessor13.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:95)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:550)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:450)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:422)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:559)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:454)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:415)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:113)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:96)
> Clients that are still connected do still work. (As they have probably cached
> the ROOT region?)
> This seemed to have happend after one of the region servers (number 3) shut
> itselfdown due to exceptions (EOFileException, Unable to create new block,
> etc... see logfile) The ROOT region then probably got moved to region server
> 2.
> I attached the logs (DEBUG enabled) of the hdfs namenode, the hbase master
> node and the two log files of regionserver 2 and 3.
> The filesystem is in healthy state. I can also download the file through the
> hadoop fs command without any problem and without getting an error message
> about missing blocks.
> Status: HEALTHY
> Total size: 142881532319 B (Total open files size: 12415139840 B)
> Total dirs: 4153
> Total files: 3541 (Files currently being written: 106)
> Total blocks (validated): 5208 (avg. block size 27435010 B) (Total open
> file blocks (not validated): 205)
> Minimally replicated blocks: 5208 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 4
> Average block replication: 4.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 7
> Number of racks: 1
> The filesystem under path '/' is HEALTHY
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.