You've written a spirited statement about the strengths of hadoop.
But I'd still be interested in hearing from someone who might
understand why an Xgrid cluster with its attendant.management system
would or would not be equally good for these problems. After all,
there are a reasonable number of Xgrid customers who are getting their
work done.

Maybe I'll need to learn more about both and also engage in some
discussions with the Xgrid community. I do intend to bring up the
Xgrid system on our cluster to see how it works for us.  That'll
certainly deepen my understanding of both.

Thanks for the detailed reply.

 - Bob


On Dec 5, 2007 12:17 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
> IF you are looking at large numbers of independent images then hadoop should
> be close to perfect for this analysis (the problem is embarrassingly
> parallel).  If you are looking at video, then you can still do quite well by
> building what is essentially a probabilistic list of recognized items in the
> video stream in the map phase, giving all frames from a single shot the same
> reduce key.  Then in the reduce phase, you can correlate the possible
> objects and their probabilities according to object persistence models.  It
> would be good to do another pass after that to do scene to scene
> correlations.  This formulation gives you near perfect parallelism as well.
>
> For NLP, the problem at the level of phrasal analysis can also be made
> trivially parallel if you have large numbers of documents.  Again, you may
> need to do a secondary pass to find duplicated references across multiple
> documents but this is usually far less intensive than the original analysis.
>
> Standard scientific HPC architectures are all about facilitating arbitrary
> communication patterns and process boundaries.  This is exceedingly hard to
> do really well and few systems attain really good performance.  Hadoop is
> all about working with a really simple primitive that is so simple that it
> can be implemented really well with simple and cheap hardware.  What is
> surprising (a bit) is that so many problems can be well expressed as
> map-reduce programs.  Sometimes this is only true at really large scale
> where correlations become small (allowing the map phase to do useful work on
> many sub-units), sometimes it requires relatively large intermediate data
> (such as many graph algorithms).  The fact is, however, that it works
> remarkably well.
>
>
> On 12/4/07 7:12 PM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote:
>
> > For us, we want to do pattern recognition, turning
> > raster images into collections of the objects we discover in the
> > images. Another focus for us is NLP, esp. phrasal analysis.
>
>

Reply via email to