Right, my preference would be to use HDFS exclusively...except that there are potential issues with many small files in HDFS and a suggestion that perhaps MogileFS might be better for many small files. My strong preference is to store everything in HDFS, then do map/reduce with the small files to produce results. Since there is a concern about storing a lot of small files in HDFS, I now wonder if I should collect small files into MogileFS, the periodically merge them together to create large files, and then store those in HDFS and then issue my map/reduces. Ick, that sounds complex/time-consuming just writing about it :-(. The files I anticipate processing are all compressed (gzip), and are on the order of 80-200M compressed. I expect to collect 4-8 of these files per hour for most hours in the day.
Ted Dunning <[EMAIL PROTECTED]> wrote: Absolutely, you can use map/reduce without HDFS. That is the standard debugging style, for one thing. For another, Nutch is all about accessing non-HDFS data (i.e. the web). If I get your drift this time, though, I would expect that you will have severe problems with bandwidth if you have a lot of task nodes working on a conventional file store. One of the great virtues of map/reduce + HDFS is that most of the map inputs are read from local disk, as are the reduce inputs. Any system that doesn't coordinate work with storage this way is likely to suffer from lower througput due to network congestion. -----Original Message----- From: C G [mailto:[EMAIL PROTECTED] Sent: Thu 9/6/2007 5:54 PM To: [email protected] Subject: Re: Use HDFS as a long term storage solution? Actually my question is 'Can you use Map/Reduce without HDFS?" Ted Dunning wrote: You can definitely use HDFS without map/reduce. It should be pretty easy to use it from a variety of languages as well, although it is unlikely that there are language bindings available off the shelf. .... On 9/6/07 1:04 PM, "C G" wrote: > Do you have to use HDFS with map/reduce? I don't fully understand how closely > bound map/reduce is to HDFS. > --------------------------------- Yahoo! oneSearch: Finally, mobile search that gives answers, not web links.
