I think it's a bit off-topic but oh well . Let's be clear about vocabulary first:
* data parallelism: executing the same instructions on multiple cores * task parallelism: multiple different tasks/functions on multiple cores * model parallelism: this is somewhat specific to science, if you need to evaluate A and B and they are independant, evaluate both in parallel. Parallelism is good for CPU, memory or cache-bound algorithms. Parallelism is: "I want to do something, how best to distribute it across all my resources" And there is concurrency (Actors, channels, CSP ...). Concurrency is "Many things want my attention, how should I split it so that everyone is satisfied". Concurrency is good for IO-bound operations so that while waiting for something, you don't block everything else (say waiting for a webpage to load, a file to transfer or be saved on disk ...) OpenMP is a data parallel framework for shared memory parallelism on a single (multi-core) machine. For task parallelism the reference is Intel TBB (Threads Building Blocks, used in OpenCV extensively). For data parallelism on a distributed system (network, cluster), the standard is MPI (Message Passing interface). OpenMP and MPI can be mixed so that MPI distributes the load on different cluster nodes and OpenMP distribute this sub-load on all the cores of each nodes. You can check very short examples of Matrix Multiplication using OpenMPI, Intel TBB, OpenMP and Intel Cilk Plus (an older alternative to TBB) here: [http://blog.speedgocomputing.com/search/label/parallelization](http://blog.speedgocomputing.com/search/label/parallelization) If you want to continue on this, it would be best to create another one with a clearer title in my opinion.
