hi all, is there anyone having experience with adding a new datanode into a rack-aware cluster without restarting the namenode, in cdh4 distribution? as it is said that adding a new datanode is a hot operation that can be done when the cluster is online.
i tried that but it looked not working until i restarted the namenode. what i did is: (the cluster has had 4 data nodes and i am adding the 5th) 1. add the new node (qa-str-ms02.p-qa) into /etc/hadoop/conf/hosts.include, and into /etc/hadoop/conf/slaves 2. add the rack entries for qa-str-ms02.p-qa (192.168.159.52) into /etc/hadoop/topology.data that topology.sh, the topology script, is checking, confirming that ./topology.sh qa-str-ms02.p-qa works well. the rack entry looks like: qa-str-ms02.p-qa /dc1/switch1/rack1/node5 192.168.159.52 /dc1/switch1/rack1/node5 3. on the namenode: sudo -u hdfs hdfs dfsadmin -refreshNodes 4. on the new datanode: sudo /etc/init.d/hadoop-hdfs-datanode start however, the datanode failed to handshake with the namenode and it soon exited. the namenode log said: 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Removing a n ode: /default-rack/192.168.159.52:50010 2012-11-21 18:06:11,946 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.159.52:50010 2012-11-21 18:06:11,946 ERROR org.apache.hadoop.net.NetworkTopology: Error: can't add leaf node at depth 2 to topology: Number of racks: 3 Expected number of leaves:3 /dc1/switch1/rack1/node1/192.168.159.101:50010 /dc1/switch1/rack1/node2/192.168.159.102:50010 /dc1/switch1/rack1/node3/192.168.159.103:50010 2012-11-21 18:06:11,946 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from 192.168.159.52:53968: error: org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology. org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology. at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:365) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:619) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3358) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:854) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) it seems that the newly added topology information didn't work. when i changed the operation in the following steps: 1. add the new node (qa-str-ms02.p-qa) into /etc/hadoop/conf/hosts.include, and into /etc/hadoop/conf/slaves 2. add the rack entries for qa-str-ms02.p-qa into /etc/hadoop/topology.data that topology.sh, the topology script, is checking, confirming that ./topology.sh qa-str-ms02.p-qa works well. 3. on the namenode: sudo /etc/init.d/hadoop-hdfs-namenode stop && sudo /etc/init.d/hadoop-hdfs-namenode start 4. on the new datanode: sudo /etc/init.d/hadoop-hdfs-datanode start then everything is ok and the new node was added into the cluster according to dfsadmin -report. however, the operation of restarting namenode is unwanted. does anyone have any comments or recommendations? thanks a lot in advance! - maoke