I got 2 questions: 1. I am wondering how hadoop MR performs when it runs compute intensive applications, e.g. Monte carlo method compute PI. There's a example in 0.21, QuasiMonteCarlo, but that example doesn't use random number and it generates psudo input upfront. If we use distributed random number generation, then I guess the performance of hadoop should be similar with some message passing framework, like MPI. So my guess is by using proper method hadoop would be good in compute intensive applications compared with MPI.
2. I am looking for some applications, which has large data sets and requires intensive computation. An application can be divided into a workflow, including either map reduce operations, and message passing like operations. For example, in step 1 I use hadoop MR processes 10TB of data and generates small output, say, 10GB. This 10GB can be fit into memory and they are better be processed with some interprocess communication, which will boost the performance. So in step 2 I will use MPI, etc. Is there any application has this property, perhaps in some scientific research area? Or it's just alright to use map reduce itself? Regards, Elton
