Thanks Lohit for the info. I have one more question. If I keep all data in HDFS, is there anyway I can back it up regularly. While testing I had to delete the temporary "datastore" folder and reformat the file system a couple of times. So while using Hadoop in a real environment, what are the chances of such software side uncorrectable problems to occur. Can we correct it without a reformat ? I cannot afford to loose the data I plan to put in HDFS.
Thank you. On Thu, Jun 12, 2008 at 12:02 PM, lohit <[EMAIL PROTECTED]> wrote: > Ideally what you would want is your data to be on HDFS and run your > map/reduce jobs on that data. Hadoop framework splits you data and feeds in > those splits to each map or reduce task. One problem with Image files is > that you will not be able to split them. Alternatively people have done > this, they wrap Image files within xml and create huge files which has > multiple image files in them. Hadoop offers something called streaming using > which you will be able to split the files at xml boundry and feed it to your > map/reduce tasks. Streaming also enables you to use any code like > perl/php/c++. > Check info about streaming here > http://hadoop.apache.org/core/docs/r0.17.0/streaming.html > And information about parsing XML files in streaming in here > http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F > > Thanks, > Lohit > > ----- Original Message ---- > From: Chanchal James <[EMAIL PROTECTED]> > To: [email protected] > Sent: Thursday, June 12, 2008 9:42:46 AM > Subject: Question about Hadoop > > Hi, > > > > I have a question about Hadoop. I am a beginner and just testing Hadoop. > > Would like to know how a php application would benefit from this, say an > > application that needs to work on a large number of image files. Do I have > to > > store the application in HDFS always, or do I just copy it to HDFS when > > needed, do the processing, and then copy it back to the local file system ? > > Is that the case with the data files too ? Once I have Hadoop running, do I > > keep all data & application files in HDFS always, and not use local file > > system storage ? > > > > Thank you. > >
