Hello Wei, I talk from experience of writing many HPC distributed application using Open MPI (C/C++) on x86, PowerPC and Cell B.E. processors, and Parallel Virtual Machine (PVM) way before that back in the 90's. I can say with absolute certainty:
*Any gains you believe there are because "C++ is faster than Java/Scala" will be completely blown by the inordinate amount of time you spend debugging your code and/or reinventing the wheel to do even basic tasks like linear regression.* There are undoubtably some very specialised use-cases where MPI and its brethren still dominate for High Performance Computing tasks -- like for example the nuclear decay simulations run by the US Department of Energy on supercomputers where they've invested billions solving that use case. Spark is part of the wider "Big Data" ecosystem, and its biggest advantages are traction amongst internet scale companies, hundreds of developers contributing to it and a community of thousands using it. Need a distributed fault-tolerant file system? Use HDFS. Need a distributed/fault-tolerant message-queue? Use Kafka. Need to co-ordinate between your worker processes? Use Zookeeper. Need to run it on a flexible grid of computing resources and handle failures? Run it on Mesos! The barrier to entry to get going with Spark is very low, download the latest distribution and start the Spark shell. Language bindings for Scala / Java / Python are excellent meaning you spend less time writing boilerplate code, and more time solving problems. Even if you believe you *need* to use native code to do something specific, like fetching HD video frames from satellite video capture cards -- wrap it in a small native library and use the Java Native Access interface to call it from your Java/Scala code. Have fun, and if you get stuck we're here to help! MC On 16 June 2014 08:17, Wei Da <xwd0...@gmail.com> wrote: > Hi guys, > We are making choices between C++ MPI and Spark. Is there any official > comparation between them? Thanks a lot! > > Wei >