RE: Question about Hadoop

Haijun Cao Thu, 12 Jun 2008 11:01:22 -0700

"While testing I had to delete the temporary "datastore" folder and
reformat
the file system a couple of times."


Is it because you leave hadoop.tmp.dir and other .dir parameter as
default? Try to set hadoop.tmp.dir to a dir not under /tmp.

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

Dfs.name.dir is by default under ${hadoop.tmp.dir}/dfs/name:

<property>
  <name>dfs.name.dir</name>
  <value>${hadoop.tmp.dir}/dfs/name</value>
</property>

Haijun

-----Original Message-----
From: Chanchal James [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 12, 2008 10:16 AM
To: [email protected]
Subject: Re: Question about Hadoop

Thanks Lohit for the info. I have one more question.
If I keep all data in HDFS, is there anyway I can back it up regularly.
While testing I had to delete the temporary "datastore" folder and
reformat
the file system a couple of times. So while using Hadoop in a real
environment, what are the chances of such software side uncorrectable
problems to occur. Can we correct it without a reformat ? I cannot
afford to
loose the data I plan to put in HDFS.

Thank you.

On Thu, Jun 12, 2008 at 12:02 PM, lohit <[EMAIL PROTECTED]> wrote:

> Ideally what you would want is your data to be on HDFS and run your
> map/reduce jobs on that data. Hadoop framework splits you data and
feeds in
> those splits to each map or reduce task. One problem with Image files
is
> that you will not be able to split them. Alternatively people have
done
> this, they wrap Image files within xml and create huge files which has
> multiple image files in them. Hadoop offers something called streaming
using
> which you will be able to split the files at xml boundry and feed it
to your
> map/reduce tasks. Streaming also enables you to use any code like
> perl/php/c++.
> Check info about streaming here
> http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
> And information about parsing XML files in streaming in here
>
http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse
+XML+documents+using+streaming%3F
>
> Thanks,
> Lohit
>
> ----- Original Message ----
> From: Chanchal James <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Thursday, June 12, 2008 9:42:46 AM
> Subject: Question about Hadoop
>
> Hi,
>
>
>
> I have a question about Hadoop. I am a beginner and just testing
Hadoop.
>
> Would like to know how a php application would benefit from this, say
an
>
> application that needs to work on a large number of image files. Do I
have
> to
>
> store the application in HDFS always, or do I just copy it to HDFS
when
>
> needed, do the processing, and then copy it back to the local file
system ?
>
> Is that the case with the data files too ? Once I have Hadoop running,
do I
>
> keep all data & application files in HDFS always, and not use local
file
>
> system storage ?
>
>
>
> Thank you.
>
>

RE: Question about Hadoop

Reply via email to