-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/08/13 16:19, Christopher Samuel wrote:
> Anyone seen anything similar, or any ideas on what could be going > on? Sorry, this was with: # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 Since those initial tests we've started enforcing memory limits (the system is not yet in full production) and found that this causes jobs to get killed. We tried the cgroups gathering method, but jobs still die with mpirun and now the numbers don't seem to right for mpirun or srun either: mpirun (killed): [samuel@barcoo-test Mem]$ sacct -j 94564 -o JobID,MaxRSS,MaxVMSize JobID MaxRSS MaxVMSize - ------------ ---------- ---------- 94564 94564.batch -523362K 0 94564.0 394525K 0 srun: [samuel@barcoo-test Mem]$ sacct -j 94565 -o JobID,MaxRSS,MaxVMSize JobID MaxRSS MaxVMSize - ------------ ---------- ---------- 94565 94565.batch 998K 0 94565.0 88663K 0 All the best, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIB73wACgkQO2KABBYQAh+kwACfYnMbONcpxD2lsM5i4QDw5r93 KpMAn2hPUxMJ62u2gZIUGl5I0bQ6lllk =jYrC -----END PGP SIGNATURE-----