[ 
https://issues.apache.org/jira/browse/HADOOP-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved HADOOP-12918.
----------------------------------------
    Resolution: Not A Problem

So, as it turns out... at least some of what I saw was actually being caused by 
a bad masquerade configuration on my firewalld service. MiniDFSCluster listens 
on the loopback address which, as it turns out, has been a bit buggy in the 
past with firewalld: https://bugzilla.redhat.com/show_bug.cgi?id=904098 (Seems 
like the problem persists for RHEL/CentOS even though it's fixed in Fedora). 
Turning off masquerading (or turning off the firewall) made MiniDFSCluster 
happy again.

However, regardless of the firewall, the storage ID continues to be based on 
the host's eth0 IP address, even though it's only listening on the loopback. 
That's probably actually desirable, when I think about it, because different 
servers probably shouldn't be colliding there. And, in any case, it seems like 
it only stands out in the case of MiniDFSCluster.... any other deployment 
wouldn't matter.

So, the short answer is... all symptoms were my firewalld's fault, except the 
storage ID, and none of the symptoms seem like they present a real problem 
which needs to be addressed.

> MiniDFSCluster uses wrong IP address
> ------------------------------------
>
>                 Key: HADOOP-12918
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12918
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.2.0, 2.6.1, 2.6.3
>            Reporter: Christopher Tubbs
>
> MiniDFSCluster seems to be registering the DataNode using the machine's 
> internal IP address, rather than "localhost/127.0.0.1". It looks like the 
> problem isn't MiniDFSCluster specific, but that's what's biting me right now 
> and I can't figure out a workaround.
> MiniDFSCluster logs show roughly the following (jetty services ignored):
> NameNode starts org.apache.hadoop.ipc.Server listening on 
> localhost/127.0.0.1:43023
> DataNode reports "Configured hostname is 127.0.0.1"
> DataNode reports "Opened streaming server at /127.0.0.1:57310"
> DataNode starts org.apache.hadoop.ipc.Server listening on 
> localhost/127.0.0.1:53015
> DataNode registers with NN using storage id 
> DS-XXXXXXXXX-172.31.3.214-57310-XXXXXXXXXXXXX with ipcPort=53015
> NameNode reports "Adding a new node: /default-rack/172.31.3.214:57310"
> The storage id should have been derived from 127.0.0.1, and the so should all 
> the other registered information.
> I've verified with netstat that all services were listening only on 127.0.0.1
> This resulted in the client being unable to write blocks to the datanode, 
> because it was not listening on the address given to it by the namenode (the 
> address it was registered under).
> The actual client error message is:
> {code:java}
> [IPC Server handler 0 on 43023} INFO  org.apache.hadoop.hdfs.StateChange  - 
> BLOCK* allocateBlock: /test-dir/HelloWorld.jar. 
> BP-460569874-172.31.3.214-1457727894640 
> blk_1073741825_1001{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[172.31.3.214:57310|RBW]]}
> [Thread-61} INFO  org.apache.hadoop.hdfs.DFSClient  - Exception in 
> createBlockOutputStream
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> [Thread-61} INFO  org.apache.hadoop.hdfs.DFSClient  - Abandoning 
> BP-460569874-172.31.3.214-1457727894640:blk_1073741825_1001
> [Thread-61} INFO  org.apache.hadoop.hdfs.DFSClient  - Excluding datanode 
> 172.31.3.214:57310
> [IPC Server handler 2 on 43023} WARN  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy  - Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.commons.logging.impl.Log4JLogger
> [IPC Server handler 2 on 43023} ERROR 
> org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException 
> as:christopher (auth:SIMPLE) cause:java.io.IOException: File 
> /test-dir/HelloWorld.jar could only be replicated to 0 nodes instead of 
> minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are 
> excluded in this operation.
> [IPC Server handler 2 on 43023} INFO  org.apache.hadoop.ipc.Server  - IPC 
> Server handler 2 on 43023, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
> 172.31.3.214:57395 Call#12 Retry#0: error: java.io.IOException: File 
> /test-dir/HelloWorld.jar could only be replicated to 0 nodes instead of 
> minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are 
> excluded in this operation.
> java.io.IOException: File /test-dir/HelloWorld.jar could only be replicated 
> to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running 
> and 1 node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> [Thread-61} WARN  org.apache.hadoop.hdfs.DFSClient  - DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /test-dir/HelloWorld.jar could only be replicated to 0 nodes instead of 
> minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are 
> excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> {code}
> Additional information:
> I've tried with Hadoop 2.2.0, 2.6.1, and 2.6.3 and same results. It probably 
> affects other versions.
> I do not see this problem running locally, only in EC2, but I've yet to be 
> able to find a relevant networking configuration difference which would have 
> any effect. (no extra entries in /etc/hosts, no DNS issues, etc.)
> I can reproduce this easily by trying to build Accumulo's master branch (HEAD 
> at db21315) with `mvn clean package -Dtest=VfsClassLoaderTest 
> -DfailIfNoTests=false -Dhadoop.version=2.6.3`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to