For the record, here's the Apple Xgrid (hype?) page: http://www.apple.com/server/macosx/technology/xgrid.html
- Bob On Dec 5, 2007 5:04 AM, Bob Futrelle <[EMAIL PROTECTED]> wrote: > You've written a spirited statement about the strengths of hadoop. > But I'd still be interested in hearing from someone who might > understand why an Xgrid cluster with its attendant.management system > would or would not be equally good for these problems. After all, > there are a reasonable number of Xgrid customers who are getting their > work done. > > Maybe I'll need to learn more about both and also engage in some > discussions with the Xgrid community. I do intend to bring up the > Xgrid system on our cluster to see how it works for us. That'll > certainly deepen my understanding of both. > > Thanks for the detailed reply. > > - Bob > > > > On Dec 5, 2007 12:17 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > > IF you are looking at large numbers of independent images then hadoop should > > be close to perfect for this analysis (the problem is embarrassingly > > parallel). If you are looking at video, then you can still do quite well by > > building what is essentially a probabilistic list of recognized items in the > > video stream in the map phase, giving all frames from a single shot the same > > reduce key. Then in the reduce phase, you can correlate the possible > > objects and their probabilities according to object persistence models. It > > would be good to do another pass after that to do scene to scene > > correlations. This formulation gives you near perfect parallelism as well. > > > > For NLP, the problem at the level of phrasal analysis can also be made > > trivially parallel if you have large numbers of documents. Again, you may > > need to do a secondary pass to find duplicated references across multiple > > documents but this is usually far less intensive than the original analysis. > > > > Standard scientific HPC architectures are all about facilitating arbitrary > > communication patterns and process boundaries. This is exceedingly hard to > > do really well and few systems attain really good performance. Hadoop is > > all about working with a really simple primitive that is so simple that it > > can be implemented really well with simple and cheap hardware. What is > > surprising (a bit) is that so many problems can be well expressed as > > map-reduce programs. Sometimes this is only true at really large scale > > where correlations become small (allowing the map phase to do useful work on > > many sub-units), sometimes it requires relatively large intermediate data > > (such as many graph algorithms). The fact is, however, that it works > > remarkably well. > > > > > > On 12/4/07 7:12 PM, "Bob Futrelle" <[EMAIL PROTECTED]> wrote: > > > > > For us, we want to do pattern recognition, turning > > > raster images into collections of the objects we discover in the > > > images. Another focus for us is NLP, esp. phrasal analysis. > > > > >
