Re: [ccp4bb] How to config CPU to run ccp4 with multi-processor

George Sheldrick Wed, 18 Dec 2013 12:53:17 -0800

Adapting programs to make optimal use of modern multicore chips isnon-trivial and depends very much on the algorithms employed,there is no magic bullet. First one needs to understand Amdahl's law.Assume we have a chip with four cores such as the widely usedIntel i7, amd the program consists of two parts that would each take oneminute on a single core machine, i.e. the total time taken is 2minutes. If we succeed in making part 1 fully parallel and part2 is notparallel, then the time required will be 0.25+1.0 minutes = 1.25minutes. However many cores we have, we will never reduce the total timeto less than 1 minute!. So it is important to make ALL rate

determining stages parallel.

However this is only approximately what happens, because:

1) the i7 uses hyperthreading, so it can run 2 threads on each core.However they share the number crunching unit, so this only helpsif the threads frequently have to wait, e.g. to get data from the harddisk. For efficient number crunching code the hyperthreading

does not help much.

2) The i7 actually increases its clock frequency if it is running onlyone thread, because the critical factor is the amount of heatgenerated. So the speed-up when using multiple cores is smaller than onewould expect.

3) In special cases (which I have never managed to achieve) the computergoes 'super-scalar'. This is because each core has itsown cache memory, so by dividing up a large matrix (too big for onecache) so that it is all held in cache, the speedup can be morethan expected for the number of cores, because cache access is muchfaster than RAM.

The key to writing efficient parallel code is to reduce thecommunication between threads to an absolute minimum. By doing this inthe program shelxd (heavy atom location for experimental phasing) I wasable to achieve a 27 times speedup on a 32 core machine.However for my other openmp programs the gain was much more modest. Forsome of them there is little advantage in using more

than about 4 cores, primarily because of Amdahl's law.

George


On 12/18/2013 07:50 PM, Marcin Wojdyr wrote:

On Tue, Dec 17, 2013 at 03:32:52PM +0000, Adam Ralph wrote:

Dear Chang,

     Some CCP4 progs can be used with a multi-core machine,
using OpenMP threads (including refmac it would appear). You will

I think only phaser and aimless.
Of course using 4 cores doesn't mean running 4 times faster
(it's more like ~2x faster for Phaser).

need to compile the code from source rather than taking the binary
versions

These programs are already compiled with openMP in CCP4

     Even if the CCP4 apps are not parallel themselves, they can access
a parallel version of libraries e.g. FFTW, LAPACK. Again you will
probably need to compile CCP4 from source and link with the correct
libraries.

It's possible, but I doubt it it will make noticeable difference.
Refmac runs don't spend much time in LAPACK. Probably the same with
FFTW which is used by programs that use Kevin's clipper.

One thing that can make a big difference is env. variable
GFORTRAN_UNBUFFERED_ALL. It shouldn't be set. If it is set, some
programs run a few times slower.

Marcin



--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-33021 or -33068
Fax. +49-551-39-22582

Re: [ccp4bb] How to config CPU to run ccp4 with multi-processor

Reply via email to