[
https://issues.apache.org/jira/browse/HDFS-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yongjun Zhang updated HDFS-10624:
---------------------------------
Description: (was: Seeing the following on DN log.
{code}
2016-04-07 20:27:45,416 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
opWriteBlock BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96465013
received exception java.io.EOFException: Premature EOF: no length prefix
available
2016-04-07 20:27:45,416 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
rn2-lampp-lapp1115.rno.apple.com:1110:DataXceiver error processing WRITE_BLOCK
operation src: /10.204.64.137:45112 dst: /10.204.64.151:1110
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2241)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:738)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
2016-04-07 20:27:46,116 WARN
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96458336 on
/ngs8/app/lampp/dfs/dn
2016-04-07 20:27:46,117 ERROR
org.apache.hadoop.hdfs.server.datanode.VolumeScanner:
VolumeScanner(/ngs8/app/lampp/dfs/dn, DS-a14baf2b-a1ef-4282-8d88-3203438e708e)
exiting because of exception
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
at
org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
at
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
at
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
at
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
2016-04-07 20:27:46,118 INFO
org.apache.hadoop.hdfs.server.datanode.VolumeScanner:
VolumeScanner(/ngs8/app/lampp/dfs/dn, DS-a14baf2b-a1ef-4282-8d88-3203438e708e)
exiting.
2016-04-07 20:27:46,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.204.64.151,
datanodeUuid=6064994a-6769-4192-9377-83f78bd3d7a6, infoPort=0,
infoSecurePort=1175, ipcPort=1120,
storageInfo=lv=-56;cid=cluster6;nsid=1112595121;c=0):Failed to transfer
BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96465013 to
10.204.64.10:1110 got
java.net.SocketException: Original Exception : java.io.IOException: Connection
reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at
org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:190)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:585)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:758)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:705)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2154)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(DataNode.java:2884)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.transferBlock(DataXceiver.java:862)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opTransferBlock(Receiver.java:200)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:118)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
... 25 more
{code}
Particularly
{code}
2016-04-07 20:27:46,116 WARN
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
BP-1800173197-10.204.68.5-1444425156296:blk_1170125248_96458336 on
/ngs8/app/lampp/dfs/dn
{code}
means VolumeScanner/BlockScanner found the replica corrupt, or have other
issue. It would be very helpful to report the reason here. If it's corrupt,
where is the first corrupt data (or chunk) in the block, and the total replica
length. Creating this jira to request this enhancement.
BTW, the NPE in the above log was resolved as HDFS-10512 (thanks Wei-Chiu and
Yiqun).
)
> VolumeScanner to report why a block is found bad
> ------------------------------------------------
>
> Key: HDFS-10624
> URL: https://issues.apache.org/jira/browse/HDFS-10624
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, hdfs
> Reporter: Yongjun Zhang
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]