I figured it out. It turns out that one must not modify the configuration files while the cluster is running. If you do the edits file can become corrupted. Fortunately the corruption is in the first word of the file which is a magic number and easily detected. So the solution is to be sure that the cluster is stopped before modifying the configuration.
Is this a bug? I always think that configurations are read at initialization time and then not used again. This behavior allows changes to take place when the service restarts, it's the way things work with all sorts of U/Linux services. Thanks for your help, Hank Cohen From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 28, 2012 5:03 AM To: hdfs-user@hadoop.apache.org Subject: Re: Changing where HDFS stores its data Hank, I'm able to run my HDFS with two different set of configs independently. Can you share your whole NN log? One name/data directory should not conflict with another, but in any case, it is always good to define dfs.name.dir and dfs.data.dir to the absolute paths instead of relying on hadoop.tmp.dir's implicitness. What I do is keep two different config dirs and pass the right one when needing to switch from the defaults. On Thu, Jun 28, 2012 at 1:15 PM, Giulio D'Ippolito <giulio.dippol...@gmail.com<mailto:giulio.dippol...@gmail.com>> wrote: You could manually edit the VERSION file in order to match the datanode and namenodes id's. 2012/6/27 Hank Cohen <hank.co...@altior.com<mailto:hank.co...@altior.com>> [nit] First of all I think that the datanode storage location property should be simply dfs.data.dir not dfs.datanode.data.dir (this from src/hdfs/hdfs-default.html) Both the namenode storage directory and the datanode storage directory are defined relative to hadoop.tmp.dir so simply changing that directory will change both of the subdirectories. But this doesn't allow me to change back and forth without errors. I get an error when I try to change hadoop.tmp.dir to a directory that already contains a hadoop file system. The error is: 2012-06-27 10:40:44,144 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Unexpected version of the file system log file: -333643776. Current version = -32. [Does anyone want to see the java stack trace?] When I look at the VERSION files (hadoop.tmp.dir/dfs/name/current/VERSION) the only difference I see is that namespaceID is different. I think namespaceID probably should be different, it is a different file system. Thanks for any guidance, Hank Cohen From: Konstantin Shvachko [mailto:shv.had...@gmail.com<mailto:shv.had...@gmail.com>] Sent: Monday, June 18, 2012 5:12 PM To: hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org> Subject: Re: Changing where HDFS stores its data In hdfs-site.xml you should specify dfs.namenode.name.dir for NameNode stoareg directories or / and dfs.datanode.data.dir for DataNode storage Changing temporary directory location changes the default for storage directories. Which should also work. You might want to check the message the NameNode loggs when it fails. Thanks, --Konstantin On Mon, Jun 18, 2012 at 3:47 PM, Hank Cohen <hank.co...@altior.com<mailto:hank.co...@altior.com>> wrote: I am trying to do some testing with different storage configurations for HDFS but I am having difficulty changing the storage destination without having to re-initialize the whole file system each time I change things. What I want to do: Set up and run some test cases with two different local file system configurations. Think of it as having different local disks with different performance characteristics. What I have done so far it to change the xml in core-site.xml to change the hadoop.tmp.dir property. Let's call this dir1. I can set this up and format the file system without any problems, run my tests, shut down and change core-site.xml again to dir2. Again I can format dir2 and run my tests OK but when I try to switch back to dir1 I can't get the namenode to start. I find that I have to remove all of the directories and subdirectories from dir1 then reformat and start over with nothing in the file system. Is there an easy way to do this without having to reinitialize the whole HDFS each time? Hank Cohen +1 732-440-1280 x320<tel:%2B1%20732-440-1280%20x320> Office +1 510-995-8264<tel:%2B1%20510-995-8264> Direct 444 Route 35 South Building B Eatontown, NJ 07724 USA hank.co...@altior.com<mailto:hank.co...@altior.com> www.altior.com<http://www.altior.com> [Description: EmailBug] -- Harsh J
<<inline: image001.png>>