Christopher Tubbs created HADOOP-12918:
------------------------------------------
Summary: MiniDFSCluster uses wrong IP address
Key: HADOOP-12918
URL: https://issues.apache.org/jira/browse/HADOOP-12918
Project: Hadoop Common
Issue Type: Bug
Components: ipc
Affects Versions: 2.6.3, 2.6.1, 2.2.0
Reporter: Christopher Tubbs
MiniDFSCluster seems to be registering the DataNode using the machine's
internal IP address, rather than "localhost/127.0.0.1". It looks like the
problem isn't MiniDFSCluster specific, but that's what's biting me right now
and I can't figure out a workaround.
MiniDFSCluster logs show roughly the following (jetty services ignored):
NameNode starts org.apache.hadoop.ipc.Server listening on
localhost/127.0.0.1:43023
DataNode reports "Configured hostname is 127.0.0.1"
DataNode reports "Opened streaming server at /127.0.0.1:57310"
DataNode starts org.apache.hadoop.ipc.Server listening on
localhost/127.0.0.1:53015
DataNode registers with NN using storage id
DS-XXXXXXXXX-172.31.3.214-57310-XXXXXXXXXXXXX with ipcPort=53015
NameNode reports "Adding a new node: /default-rack/172.31.3.214:57310"
The storage id should have been derived from 127.0.0.1, and the so should all
the other registered information.
I've verified with netstat that all services were listening only on 127.0.0.1
This resulted in the client being unable to write blocks to the datanode,
because it was not listening on the address given to it by the namenode (the
address it was registered under).
The actual client error message is:
{code:java}
[IPC Server handler 0 on 43023} INFO org.apache.hadoop.hdfs.StateChange -
BLOCK* allocateBlock: /test-dir/HelloWorld.jar.
BP-460569874-172.31.3.214-1457727894640
blk_1073741825_1001{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[172.31.3.214:57310|RBW]]}
[Thread-61} INFO org.apache.hadoop.hdfs.DFSClient - Exception in
createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
[Thread-61} INFO org.apache.hadoop.hdfs.DFSClient - Abandoning
BP-460569874-172.31.3.214-1457727894640:blk_1073741825_1001
[Thread-61} INFO org.apache.hadoop.hdfs.DFSClient - Excluding datanode
172.31.3.214:57310
[IPC Server handler 2 on 43023} WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy - Not able
to place enough replicas, still in need of 1 to reach 1
For more information, please enable DEBUG log level on
org.apache.commons.logging.impl.Log4JLogger
[IPC Server handler 2 on 43023} ERROR
org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException
as:christopher (auth:SIMPLE) cause:java.io.IOException: File
/test-dir/HelloWorld.jar could only be replicated to 0 nodes instead of
minReplication (=1). There are 1 datanode(s) running and 1 node(s) are
excluded in this operation.
[IPC Server handler 2 on 43023} INFO org.apache.hadoop.ipc.Server - IPC
Server handler 2 on 43023, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 172.31.3.214:57395
Call#12 Retry#0: error: java.io.IOException: File /test-dir/HelloWorld.jar
could only be replicated to 0 nodes instead of minReplication (=1). There are
1 datanode(s) running and 1 node(s) are excluded in this operation.
java.io.IOException: File /test-dir/HelloWorld.jar could only be replicated to
0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1
node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
[Thread-61} WARN org.apache.hadoop.hdfs.DFSClient - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test-dir/HelloWorld.jar could only be replicated to 0 nodes instead of
minReplication (=1). There are 1 datanode(s) running and 1 node(s) are
excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
{code}
Additional information:
I've tried with Hadoop 2.2.0, 2.6.1, and 2.6.3 and same results. It probably
affects other versions.
I do not see this problem running locally, only in EC2, but I've yet to be able
to find a relevant networking configuration difference which would have any
effect. (no extra entries in /etc/hosts, no DNS issues, etc.)
I can reproduce this easily by trying to build Accumulo's master branch (HEAD
at db21315) with `mvn clean package -Dtest=VfsClassLoaderTest
-DfailIfNoTests=false -Dhadoop.version=2.6.3`
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)