Hi, Topologies are unfortunately cached in Hadoop today and not refreshed until restart.
So if you add a DN before properly configuring its topology, its improper default will stick and you'll need to restart the NN to make it lose that cache. Hence, your second process of configuring first, starting last, makes more sense. Perhaps you may also want to configure a better default, that suits your regular topology levels, such that even if you make a mistake, the node still starts up. On Thu, Nov 22, 2012 at 7:07 AM, Maoke <fib...@gmail.com> wrote: > hi all, > > is there anyone having experience with adding a new datanode into a > rack-aware cluster without restarting the namenode, in cdh4 distribution? as > it is said that adding a new datanode is a hot operation that can be done > when the cluster is online. > > i tried that but it looked not working until i restarted the namenode. what > i did is: > > (the cluster has had 4 data nodes and i am adding the 5th) > 1. add the new node (qa-str-ms02.p-qa) into /etc/hadoop/conf/hosts.include, > and into /etc/hadoop/conf/slaves > 2. add the rack entries for qa-str-ms02.p-qa (192.168.159.52) into > /etc/hadoop/topology.data that topology.sh, the topology script, is > checking, confirming that ./topology.sh qa-str-ms02.p-qa works well. the > rack entry looks like: > > qa-str-ms02.p-qa /dc1/switch1/rack1/node5 > 192.168.159.52 /dc1/switch1/rack1/node5 > > 3. on the namenode: sudo -u hdfs hdfs dfsadmin -refreshNodes > 4. on the new datanode: sudo /etc/init.d/hadoop-hdfs-datanode start > > however, the datanode failed to handshake with the namenode and it soon > exited. the namenode log said: > > 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Removing > a n > > ode: /default-rack/192.168.159.52:50010 > 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/192.168.159.52:50010 > > 2012-11-21 18:06:11,946 ERROR org.apache.hadoop.net.NetworkTopology: Error: > can't add leaf node at depth 2 to topology: > Number of racks: 3 > Expected number of leaves:3 > /dc1/switch1/rack1/node1/192.168.159.101:50010 > > /dc1/switch1/rack1/node2/192.168.159.102:50010 > /dc1/switch1/rack1/node3/192.168.159.103:50010 > > 2012-11-21 18:06:11,946 WARN org.apache.hadoop.ipc.Server: IPC Server > handler 4 on 8020, call > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode > from 192.168.159.52:53968: error: > org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid > network topology. You cannot have a rack and a non-rack node at the same > level of the network topology. > > org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid > network topology. You cannot have a rack and a non-rack node at the same > level of the network topology. > at > org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:365) > > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:619) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3358) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:854) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91) > > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20018) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) > > > it seems that the newly added topology information didn't work. > > when i changed the operation in the following steps: > > 1. add the new node (qa-str-ms02.p-qa) into /etc/hadoop/conf/hosts.include, > and into /etc/hadoop/conf/slaves > > 2. add the rack entries for qa-str-ms02.p-qa into /etc/hadoop/topology.data > that topology.sh, the topology script, is checking, confirming that > ./topology.sh qa-str-ms02.p-qa works well. > 3. on the namenode: sudo /etc/init.d/hadoop-hdfs-namenode stop && sudo > /etc/init.d/hadoop-hdfs-namenode start > 4. on the new datanode: sudo /etc/init.d/hadoop-hdfs-datanode start > > then everything is ok and the new node was added into the cluster according > to dfsadmin -report. > > however, the operation of restarting namenode is unwanted. does anyone have > any comments or recommendations? thanks a lot in advance! > > > - maoke > > -- Harsh J