Yongjun Zhang created HDFS-7236:
-----------------------------------

             Summary: 
TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in trunk
                 Key: HDFS-7236
                 URL: https://issues.apache.org/jira/browse/HDFS-7236
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Yongjun Zhang


Per the following report
{code}
****Recently FAILED builds in url: 
https://builds.apache.org/job/Hadoop-Hdfs-trunk
    THERE ARE 4 builds (out of 5) that have failed tests in the past 7 days, as 
listed below:

===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1898/testReport (2014-10-11 
04:30:40)
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1897/testReport (2014-10-10 
04:30:40)
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
    Failed test: org.apache.hadoop.tracing.TestTracing.testReadTraceHooks
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
    Failed test: org.apache.hadoop.tracing.TestTracing.testWriteTraceHooks
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1896/testReport (2014-10-09 
04:30:40)
    Failed test: 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeMetrics
    Failed test: 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testRoundTripAckMetric
    Failed test: 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testSendDataPacketMetrics
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
    Failed test: 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testReceivePacketMetrics
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1895/testReport (2014-10-08 
04:30:40)
    Failed test: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress

Among 5 runs examined, all failed tests <#failedRuns: testName>:
    4: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress
    2: 
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend
    2: 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots
    1: org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode.testDeadDatanode
...
{code}

TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots failed in most 
recent two runs in trunk. Creating this jira for it (The other two tests that 
failed more often were reported in separate jira (HDFS-7221 and HDFS-7226)

Symptom:

{code}
Error Message

Timed out waiting for Mini HDFS Cluster to start
Stacktrace

java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1194)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1819)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1789)
        at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.doTestMultipleSnapshots(TestOpenFilesWithSnapshot.java:184)
        at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots(TestOpenFilesWithSnapshot.java:162)
{code}

AND

{code}
2014-10-11 12:38:24,385 ERROR datanode.DataNode (DataXceiver.java:run(243)) - 
127.0.0.1:55303:DataXceiver error processing WRITE_BLOCK operation  src: 
/127.0.0.1:32949 dst: /127.0.0.1:55303
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:196)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:720)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:662)
{code}

AND

{code}
2014-10-11 12:38:28,552 WARN  datanode.DataNode 
(BPServiceActor.java:offerService(751)) - RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Got incremental 
block report from unregistered or dead node
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:3021)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:6355)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1137)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209)
        at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26304)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:639)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2125)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2121)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1640)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2119)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy18.blockReceivedAndDeleted(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:224)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:307)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
        at java.lang.Thread.run(Thread.java:662)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to