Hi All, I recently entered an Hadoop implementation for the SICSA Muticore Challenge, held last week: http://www.macs.hw.ac.uk/sicsawiki/index.php/Challenge_PhaseI
The aim was to implement the concordance application, in whichever language, or framework you felt best. We ended up comparing a wide variety, including Erlang, Parallel Haskell, Java with Fork/Join, and OpenMP, amongst others. Whilst most implementations gave very low runtime for very small inputs, Hadoop was not able (and is not designed) to do so. But where the Hadoop implementation shone through, was the scaling of input size. I have written a summary of my implementation, optimizations, and put a link to the complete set of slides I presented, at this address: http://www.macs.hw.ac.uk/~rs46/multicore_challenge1/ Perhaps the highlight of these results is (running on 16 nodes): Benchmark 1 --- Input File: Bible.txt - 801,541 words Runtime: 36 seconds Benchmark 2 --- Input File: ascii100MB.txt - 18,030,005 words Runtime: 65 seconds That is an increase multiplier for input size of 22.5, but an increase in runtime of just 1.8. ------------ Feedback would be welcome. It was interesting to see that some of the shared memory implementations were not able to compute the 100mb file without Out-Of-Memory errors. This was not a problem for Hadoop. There is a plan to hold another Multicore Challenge, in May 2011. If anyone wants to make any inquiries, I suggest you get in touch with the faciliator, Hans-Wolfgang Loidl, who's named at the bottom of this page: http://www.sicsa.ac.uk/news/sicsa-multicore-challenge Regards, Rob Stewart
