RE: Changing where HDFS stores its data

Hank Cohen Thu, 28 Jun 2012 07:26:40 -0700

I figured it out.
It turns out that one must not modify the configuration files while the cluster 
is running.
If you do the edits file can become corrupted.  Fortunately the corruption is 
in the first word of the file which is a magic number and easily detected.
So the solution is to be sure that the cluster is stopped before modifying the 
configuration.


Is this a bug?  I always think that configurations are read at initialization 
time and then not used again.
This behavior allows changes to take place when the service restarts, it's the 
way things work with all sorts of U/Linux services.
Thanks for your help,
Hank Cohen

From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, June 28, 2012 5:03 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Changing where HDFS stores its data

Hank,

I'm able to run my HDFS with two different set of configs independently. Can 
you share your whole NN log? One name/data directory should not conflict with 
another, but in any case, it is always good to define dfs.name.dir and 
dfs.data.dir to the absolute paths instead of relying on hadoop.tmp.dir's 
implicitness. What I do is keep two different config dirs and pass the right 
one when needing to switch from the defaults.
On Thu, Jun 28, 2012 at 1:15 PM, Giulio D'Ippolito 
<giulio.dippol...@gmail.com<mailto:giulio.dippol...@gmail.com>> wrote:
You could manually edit the VERSION file in order to match the datanode and 
namenodes id's.

2012/6/27 Hank Cohen <hank.co...@altior.com<mailto:hank.co...@altior.com>>
[nit] First of all I think that the datanode storage location property should 
be simply dfs.data.dir not dfs.datanode.data.dir (this from 
src/hdfs/hdfs-default.html)

Both the namenode storage directory and the datanode storage directory are 
defined relative to hadoop.tmp.dir so simply changing that directory will 
change both of the subdirectories.  But this doesn't allow me to change back 
and forth without errors.

I get an error when I try to change hadoop.tmp.dir to a directory that already 
contains a hadoop file system.
The error is:
2012-06-27 10:40:44,144 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
java.io.IOException: Unexpected version of the file system log file: 
-333643776. Current version = -32.
[Does anyone want to see the java stack trace?]

When I look at the VERSION files (hadoop.tmp.dir/dfs/name/current/VERSION)
the only difference I see is that namespaceID is different.  I think 
namespaceID probably should be different, it is a different file system.

Thanks for any guidance,
Hank Cohen


From: Konstantin Shvachko 
[mailto:shv.had...@gmail.com<mailto:shv.had...@gmail.com>]
Sent: Monday, June 18, 2012 5:12 PM
To: hdfs-user@hadoop.apache.org<mailto:hdfs-user@hadoop.apache.org>
Subject: Re: Changing where HDFS stores its data

In hdfs-site.xml you should specify
dfs.namenode.name.dir
for NameNode stoareg directories or / and
dfs.datanode.data.dir
for DataNode storage

Changing temporary directory location changes the default for storage 
directories.
Which should also work. You might want to check the message the NameNode loggs 
when it fails.

Thanks,
--Konstantin
On Mon, Jun 18, 2012 at 3:47 PM, Hank Cohen 
<hank.co...@altior.com<mailto:hank.co...@altior.com>> wrote:
I am trying to do some testing with different storage configurations for HDFS 
but I am having difficulty changing the storage destination without having to 
re-initialize the whole file system each time I change things.

What I want to do: Set up and run some test cases with two different local file 
system configurations.  Think of it as having different local disks with 
different performance characteristics.

What I have done so far it to change the xml in core-site.xml to change the 
hadoop.tmp.dir property.  Let's call this dir1.
I can set this up and format the file system without any problems, run my 
tests, shut down and change core-site.xml again to dir2.
Again I can format dir2 and run my tests OK but when I try to switch back to 
dir1 I can't get the namenode to start.  I find that I have to remove all of 
the directories and subdirectories from dir1 then reformat and start over with 
nothing in the file system.

Is there an easy way to do this without having to reinitialize the whole HDFS 
each time?

Hank Cohen

+1 732-440-1280 x320<tel:%2B1%20732-440-1280%20x320> Office
+1 510-995-8264<tel:%2B1%20510-995-8264>  Direct

444 Route 35 South
Building B
Eatontown, NJ 07724 USA

hank.co...@altior.com<mailto:hank.co...@altior.com>
www.altior.com<http://www.altior.com>

[Description: EmailBug]






--
Harsh J

<<inline: image001.png>>

RE: Changing where HDFS stores its data

Reply via email to