Hi,
I am running for the very first times molpro in parallel on an IBM regatta (p690) system with colony switch.
I have used Global Arrays 3.3.1 (GA, TARGET=LAPI64), and it seems to run fine.
However now I want to get as much as possibile from it and I have many obscure points.
The typical target systems of our users are CCSD(T) optimnization.
Running one of them, I would like to use as much as possibile the memory for integrals, and then, if not possibile, the disk, that is a shared GPFS filesystem: there is no benefit in concurrent accesses to it, and it became the bottleneck for the code.
With typical options I have found that the task don't use much memory, nor standard neither in GA.
Running on 16 cpus, At the very beginning of the output I found:
********** ARMCI configured for 2 cluster nodes
MPP nodes nproc
sp154 8
sp152 8
ga_uses_ma=false, calling ma_init with nominal heap. Any -G option will be ignored.
Primary working directories: /scratch_ssa/abc0 Secondary working directories: /scratch_ssa/abc0
blaslib=default
MPP tuning parameters: Latency= 84 Microseconds, Broadcast speed= 233 MB/sec
default implementation of scratch files=ga
**********
Only if I use one task (or maybe, one node) I can find ga_uses_ma=true
on the other side the statement: "default implementation of scratch files=ga" would let me think that they are "in-memory files"... however what happend at run-time does not correspond to it:
In fact I observe a lot of I/O, and the used memory is about 200 MB (of 2GB) for each task.
After the CCSD I get: DISK USED * 9.10 GB GA USED * 120.58 MB (max) .00 MB (current)
And actually I set in the beginning: memory,200,M
(that is not the GA memory, but the -G option is ignored... I do not understand why).
Can anybody of you explain some of these facts, and give some suggestion for parallel runs?
For istance I tried also direct calculations, but: 1. it was very slow 2. it terminates with the error:
****** FILE 5 RECORD 1380 OFFSET= 0. NOT FOUND
Records on file 5
IREC NAME TYPE OFFSET LENGTH IMPLEMENTATION EXT PREV PARENT MPP_STATE
1 4000 4096. 21301. df 0 0 0 1
2 4001 25397. 166404. df 0 0 0 1
3 4002 191801. 10725. df 0 0 0 0
4 4003 202526. 178782. df 0 0 0 1
5 35020 381308. 10496. df 0 0 0 1
6 3600 391804. 273. df 0 0 0 1
7 3601 392077. 273. df 0 0 0 1
8 35000 392350. 10. df 0 0 0 1
9 35001 392360. 10. df 0 0 0 1
10 35010 392370. 320. df 0 0 0 1
11 35011 392690. 320. df 0 0 0 1
12 7005 393010. 314964. df 0 0 0 1
13 8005 707974. 314964. df 0 0 0 1
14 9101 1022938. 9567696. df 0 0 0 0
15 9103 10590634. 9567696. df 0 0 0 0
? Error ? Record not found ? The problem occurs in readm
ERROR EXIT CURRENT STACK: CIPRO MAIN *******
Many Thanks for any help
Regards
Sigismondo Boschi
-- Sigismondo Boschi, Ph.D. tel: +39 051 6171559 CINECA (High Performance Systems) fax: +39 051 6137273 - 6132198 via Magnanelli, 6/3 http://instm.cineca.it 40033 Casalecchio di Reno (BO)-ITALY http://www.cineca.it
