[
https://issues.apache.org/jira/browse/HDFS-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406354#comment-13406354
]
Junping Du commented on HDFS-3364:
----------------------------------
I saw this error in pre-commit test for HADOOP-8472, and also cannot reproduce
in local. The full log is
https://builds.apache.org/job/PreCommit-HADOOP-Build/1155//testReport/org.apache.hadoop.hdfs/TestFileAppend4/testCompleteOtherLeaseHoldersFile/
and the relevant log is as following:
{noformat}
2012-06-29 16:26:22,187 INFO ipc.Server (Server.java:stop(1991)) - Stopping
server on 46047
2012-06-29 16:26:22,424 DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(431)) - Sending heartbeat from service
actor: Block pool BP-166940465-67.195.138.20-1340987178877 (storage id
DS-851929818-67.195.138.20-36687-1340987179189) service to
localhost/127.0.0.1:49801
2012-06-29 16:26:22,424 DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(431)) - Sending heartbeat from service
actor: Block pool BP-166940465-67.195.138.20-1340987178877 (storage id
DS-1404243958-67.195.138.20-36424-1340987179356) service to
localhost/127.0.0.1:49801
2012-06-29 16:26:22,424 DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(431)) - Sending heartbeat from service
actor: Block pool BP-166940465-67.195.138.20-1340987178877 (storage id
DS-1672148922-67.195.138.20-53642-1340987179269) service to
localhost/127.0.0.1:49801
2012-06-29 16:26:22,425 INFO ipc.Server (Server.java:run(638)) - Stopping IPC
Server listener on 46047
2012-06-29 16:26:22,425 INFO ipc.Server (Server.java:run(780)) - Stopping IPC
Server Responder
2012-06-29 16:26:22,425 INFO datanode.DataNode (DataNode.java:shutdown(1068))
- Waiting for threadgroup to exit, active threads is 1
2012-06-29 16:26:22,428 WARN ipc.Server (Server.java:processResponse(979)) -
IPC Server Responder, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from
127.0.0.1:51257: output error
2012-06-29 16:26:22,429 INFO ipc.Server (Server.java:run(1745)) - IPC Server
handler 9 on 49801 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2131)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:107)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:930)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:994)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1738)
{noformat}
>From this log, the problem happens in shutting down 3rd datanodes as previous
>2 datanodes are shut down successfully. It looks like when shutting down
>ipcserver on 3rd datanode, it happens to do heartbeat for BPServiceActor in
>offerService() so some heartbeat exceptions are thrown out.
Also, the following code which is for interrupt offerService() is not taking
effective which means offerService() pending on somewhere else? like heartbeat
response or pendingIncrementalBR. Does adding catch InterruptedException(...)
in whole offerService() can help here?
{code}
synchronized(pendingIncrementalBR) {
if (waitTime > 0 && pendingReceivedRequests == 0) {
try {
pendingIncrementalBR.wait(waitTime);
} catch (InterruptedException ie) {
LOG.warn("BPOfferService for " + this + " interrupted");
}
}
}
{code}
> TestFileAppend4.testRecoverFinalizedBlock occasionally times out
> ----------------------------------------------------------------
>
> Key: HDFS-3364
> URL: https://issues.apache.org/jira/browse/HDFS-3364
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0-alpha
> Reporter: Eli Collins
>
> I've seen TestFileAppend4.testRecoverFinalizedBlock shutdown occasionally
> time out in jenkins. Doesn't fail for me locally.
> {noformat}
> test timed out after 60000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 60000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Thread.join(Thread.java:1186)
> at java.lang.Thread.join(Thread.java:1239)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:473)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:259)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:117)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1098)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1280)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1260)
> at
> org.apache.hadoop.hdfs.TestFileAppend4.testRecoverFinalizedBlock(TestFileAppend4.java:208)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira