Re: Question about Hadoop

Chanchal James Sat, 14 Jun 2008 07:31:11 -0700

Thank you very much for explaining it to me, Ted.. Thats a great deal of
info!
I guess that could be how "Yahoo Webmap" is designed..


And for anyone trying to figure out the massiveness of Hadoop computing,
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/should
give a good picture of a practical case. I was for a moment
flabbergasted, and instantly fell in love with Hadoop! ;)


On Sat, Jun 14, 2008 at 12:11 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Usually hadoop programs are not used interactively since what they excel at
> is batch operations on very large collections of data.
>
> It is quite reasonable to store resulting data in hadoop and access those
> results using hadoop.  The cleanest way to do that is to have a
> presentation
> layer web server that has all of the UI on it and use http to access the
> results file from hadoop via the namenodes data access URL.  This works
> well
> where the results are not particularly voluminous.
>
> For large quantities of data such as the output of a web-crawl, it is
> usually better to copy the output out of hadoop and into a clustered system
> that supports high speed querying of the data.  This clustered system might
> be as simple as a redundant memcache or mySql farm or as fancy as a sharded
> and replicated farm of text retrieval engines running under Solr.  What
> works for you will vary by what you need to do.
>
> You should keep in mind that hadoop was designed for very long MTBF (for a
> cluster), but not designed for zero downtime operation.  At the very least,
> you will occasionally want to upgrade the cluster software and that
> currently can't be done during normal operations.  Combining hadoop (for
> heavy duty computations) with a separate persistence layer (for high
> availability web service) is a good hybrid.
>
> On Thu, Jun 12, 2008 at 9:53 PM, Chanchal James <[EMAIL PROTECTED]>
> wrote:
>
> > Thank you all for the responses.
> >
> > So in order to run a web-based application, I just need to put the part
> of
> > the application that needs to make use of distributed computation in
> HDFS,
> > and have the other web site related files access it via Hadoop streaming
> ?
> >
> > Is that how Hadoop is used ?
> >
> > Sorry the question may sound too silly.
> >
> > Thank you.
> >
> >
>
> --
> ted
>

Re: Question about Hadoop

Reply via email to