[Beowulf] Again about NUMA (numactl and taskset)

Mikhail Kuzminsky Mon, 23 Jun 2008 07:29:18 -0700

I'm testing my 1st dual-socket quad-core Opteron 2350-based server.

Let me assume that the RAM used by kernel and system processes iszero, there is no physical RAM fragmentation, and the affinity ofprocesses to CPU cores is maintained. I assume also that both thenodes are populated w/equal number of the same DIMMs.

If I run thread- parallelized (for example, w/OpenMP) application w/8threads (8 = number of server CPU cores), the ideal case for all the("equal") threads is: the shared memory used by each of 2 CPUs (byeach of 2 processes "quads") should be divided equally between 2nodes, and the local memory used by each process should be mappedanalogically.Theoretically like ideal case may be realized if my application (8threads) uses practically all the RAM and uses only shared memory (Iassume here also that all the RAM addresses have the same load, andthe size of program codes is zero :-) ).


The questions are

1) Is there some way to distribute analogously the local memory ofthreads (I assume that it have the same size for each thread) using"reasonable" NUMA allocation ?

2) Is it right that using of numactl for applications may givesimprovements of performance for the following case:the number of application processes is equal to the number of cores ofone CPU *AND* the necessary (for application) RAM amount may be placedon one node DIMMs (I assume that RAM is allocated "continously").

What will be w/performance (at numactl using) for the case if RAM sizerequired is higher than RAM available per one node, and therefore theprogram will not use the possibility of (load balanced) simultaneoususing of memory controllers on both CPUs ? (I also assume also thatRAM is allocated continously).


3) Is there some reason to use things like
mpirun -np N /usr/bin/numactl <numactl_parameters>  my_application   ?

4) If I use malloc() and don't use numactl, how to understand - fromwhich node Linux will begin the real memory allocation ? (I rememberthat I assume that all the RAM is free) And how to understand whereare placed the DIMMs which will corresponds to higher RAM addresses orlower RAM addresses ?

5) In which cases is it reasonable to switch on "Node memoryinterleaving" (in BIOS) for the application which uses more memorythan is presented on the node ?And BTW: if I usetaskset -c CPU1,CPU2, ... <program_file>and the program_file creates some new processes, will all thisprocesses run only on the same CPUs defined in taskset command ?


Mikhail Kuzminsky
Computer Assistance to Chemical Research Center,
Zelinsky Institute of Organic Chemistry

Moscow_______________________________________________

Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Again about NUMA (numactl and taskset)

Reply via email to