Looks good to me...
-----Original Message----- From: Chanchal James [mailto:[EMAIL PROTECTED] Sent: Thursday, June 12, 2008 11:22 AM To: [email protected] Subject: Re: Question about Hadoop Haijun, I have most of the settings as default, but not tmp dir. I have the tmp dir set to "/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}". Is this a good location ? On Thu, Jun 12, 2008 at 12:59 PM, Haijun Cao <[EMAIL PROTECTED]> wrote: > > "While testing I had to delete the temporary "datastore" folder and > reformat > the file system a couple of times." > > Is it because you leave hadoop.tmp.dir and other .dir parameter as > default? Try to set hadoop.tmp.dir to a dir not under /tmp. > > <property> > <name>hadoop.tmp.dir</name> > <value>/tmp/hadoop-${user.name}</value> > <description>A base for other temporary directories.</description> > </property> > > Dfs.name.dir is by default under ${hadoop.tmp.dir}/dfs/name: > > <property> > <name>dfs.name.dir</name> > <value>${hadoop.tmp.dir}/dfs/name</value> > </property> > > Haijun > > -----Original Message----- > From: Chanchal James [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 12, 2008 10:16 AM > To: [email protected] > Subject: Re: Question about Hadoop > > Thanks Lohit for the info. I have one more question. > If I keep all data in HDFS, is there anyway I can back it up regularly. > While testing I had to delete the temporary "datastore" folder and > reformat > the file system a couple of times. So while using Hadoop in a real > environment, what are the chances of such software side uncorrectable > problems to occur. Can we correct it without a reformat ? I cannot > afford to > loose the data I plan to put in HDFS. > > Thank you. > > On Thu, Jun 12, 2008 at 12:02 PM, lohit <[EMAIL PROTECTED]> wrote: > > > Ideally what you would want is your data to be on HDFS and run your > > map/reduce jobs on that data. Hadoop framework splits you data and > feeds in > > those splits to each map or reduce task. One problem with Image files > is > > that you will not be able to split them. Alternatively people have > done > > this, they wrap Image files within xml and create huge files which has > > multiple image files in them. Hadoop offers something called streaming > using > > which you will be able to split the files at xml boundry and feed it > to your > > map/reduce tasks. Streaming also enables you to use any code like > > perl/php/c++. > > Check info about streaming here > > http://hadoop.apache.org/core/docs/r0.17.0/streaming.html > > And information about parsing XML files in streaming in here > > > http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse > +XML+documents+using+streaming%3F<http://hadoop.apache.org/core/docs/r0. 17.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F> > > > > Thanks, > > Lohit > > > > ----- Original Message ---- > > From: Chanchal James <[EMAIL PROTECTED]> > > To: [email protected] > > Sent: Thursday, June 12, 2008 9:42:46 AM > > Subject: Question about Hadoop > > > > Hi, > > > > > > > > I have a question about Hadoop. I am a beginner and just testing > Hadoop. > > > > Would like to know how a php application would benefit from this, say > an > > > > application that needs to work on a large number of image files. Do I > have > > to > > > > store the application in HDFS always, or do I just copy it to HDFS > when > > > > needed, do the processing, and then copy it back to the local file > system ? > > > > Is that the case with the data files too ? Once I have Hadoop running, > do I > > > > keep all data & application files in HDFS always, and not use local > file > > > > system storage ? > > > > > > > > Thank you. > > > > >
