Thank you very much for explaining it to me, Ted.. Thats a great deal of info! I guess that could be how "Yahoo Webmap" is designed..
And for anyone trying to figure out the massiveness of Hadoop computing, http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/should give a good picture of a practical case. I was for a moment flabbergasted, and instantly fell in love with Hadoop! ;) On Sat, Jun 14, 2008 at 12:11 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Usually hadoop programs are not used interactively since what they excel at > is batch operations on very large collections of data. > > It is quite reasonable to store resulting data in hadoop and access those > results using hadoop. The cleanest way to do that is to have a > presentation > layer web server that has all of the UI on it and use http to access the > results file from hadoop via the namenodes data access URL. This works > well > where the results are not particularly voluminous. > > For large quantities of data such as the output of a web-crawl, it is > usually better to copy the output out of hadoop and into a clustered system > that supports high speed querying of the data. This clustered system might > be as simple as a redundant memcache or mySql farm or as fancy as a sharded > and replicated farm of text retrieval engines running under Solr. What > works for you will vary by what you need to do. > > You should keep in mind that hadoop was designed for very long MTBF (for a > cluster), but not designed for zero downtime operation. At the very least, > you will occasionally want to upgrade the cluster software and that > currently can't be done during normal operations. Combining hadoop (for > heavy duty computations) with a separate persistence layer (for high > availability web service) is a good hybrid. > > On Thu, Jun 12, 2008 at 9:53 PM, Chanchal James <[EMAIL PROTECTED]> > wrote: > > > Thank you all for the responses. > > > > So in order to run a web-based application, I just need to put the part > of > > the application that needs to make use of distributed computation in > HDFS, > > and have the other web site related files access it via Hadoop streaming > ? > > > > Is that how Hadoop is used ? > > > > Sorry the question may sound too silly. > > > > Thank you. > > > > > > -- > ted >
