Hello.

Geetings to everyone.
For some time already we are testing HDFS filesystem (without Map/Reduce) for our cluster setup. It is mostly OK. I've encountered FreeBSD support issue (https://issues.apache.org/jira/browse/HADOOP-7294) that I did workaround by creating custom shell script "stat" command that mimics Linux behaviour.

Now I've decided to look at reliability/HA options for name node.
As far as I could see, secondary name node should be replaced either with checkpoint or backup node. Unfortunately, documentation does not shows any pros or cons of the options. So, anyway, I've decided to go backup node way. It was strange for me that default configuration is done with deprecated secondary name node setup. Also all the scripts are good for secondary name node and not "new way".
I did next things:
1) I've chosen one node to be my backup node
2) I did create conf/backup file that has my backup node hostname
3) I did replace last line of start-dfs.sh script (the line that were starting secondary name node) with next line: "$HADOOP_COMMON_HOME"/bin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts backup --script "$bin"/hdfs start namenode -backup $nameStartOpt 4) I did add dfs.http.address property to my hdsf-site.xml file. Until I did so, my backup node could not find my name node
5) I did create name directory on backup host
6) I stopped secondary name node
7) I did try to start my backup node.
Now I am getting java.io.FileNotFoundException: http://backup:50070/getimage?putimage=1&port=50105&machine=10.112.0.213&token=-24:1842738969:0:1307458998000:1307458455405 messages on backup node and 2011-06-07 15:40:51,534 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Inconsistent checkpoint fields.
LV = -24 namespaceID = 1842738969 cTime = 0; checkpointTime = 1307458455405.
Expecting respectively: -24; 1842738969; 0; 1307458455406
on name node.
Is it ok, will they sync after some time?

BTW: Is it correct that http://hadoop.apache.org/common/docs/current/index.html point to 0.20 and not 0.21 documentation?

Best regards, Vitalii Tymchyshyn

Reply via email to