I agree with George and Marcin that getting significant speedups is a non-trivial business. Unfortunately as CPU clock speeds drop, to become more energy efficient, the only way to keep performance up is to work in parallel. Accelerator technology and Hadoop make many-core environments more accessible.
OpenMP and OpenACC directives provide a relatively simple way of exploiting modest gains. Compilers are already performing unseen optimizations on your code and in the future they may use any accelerators you have without any changes. I have not benchmarked any CCP4 progs, so any gains maybe disappointing. All I know is that it is possible. Adam
