$ time julia --proc=auto -e ""

real 0m3.292s
user 0m15.464s
sys 0m0.593s


"auto" means 8 CPUs (+1 for master) on my machine. Maybe it should mean 4 
because of hyperthreading? Or some in-between (6?) number? I didn't wait 15 
sec, more like 4.


I'm not really worried about startup wait for procs=4, 8 or 80, unless this 
indicates problems elsewhere, just curious.. and it seemed abnormal (at 
first) that the wait would get longer with higher numbers or even 1.. The 
point of many procs is parallel speedup, even if all the CPUs have to do 
the same on startup, in theory it should run in parallel, I guess this is 
just too much for the [L3] cache.. The wait gets to be really long with 
--proc=80 (that I do not have, and thus not a worry for me to get fast, 
just not to crash..). 


Does it ever make sense to go above the number of virtual CPUs for --proc? 
I was just testing out the slowdown with up to --proc=80 that crashed with 
VM off (but worked when on) on my Ubuntu 14.04 Linux and I got a black 
screen with a brown blinking cursor and couldn't even get to a virtual 
terminal, and had to reset (not entirely unexpected..). On a second try it 
got frozen for a long while I couldn't get a virtual terminal and got my 
session closed in the end, but managed to not have to restart..


I'm not too worried about proc=1, but should I make a PR that limits procs 
to at most the number of [virtual] CPUs? I think I could manage that, or 
maybe, if you can think of a reason to go higher, say at most a double the 
number of CPUs? More complicated would be to take the amount of [virtual] 
memory into account. If there is a reason to go higher, can't the number of 
workers always be changed from within the program? The programmer should 
know better and maybe have that capability, but for users it seems not user 
friendly to be able to crash by invoking from the shell with high numbers 
like --proc=80, that options seems not needed just waiting for tinkerers 
like me that like to try everything out.. :-/


$ time julia --proc=0 -e ""
ERROR: julia: -p,--procs=<n> must be an integer >= 1

I can see with --proc=1:

$ ps aux |grep julia
qwerty    8278 45.7  1.5 8720228 127244 pts/9  Sl+  17:05   0:01 julia 
--proc=1
qwerty    8282 22.7  1.4 8616920 116076 ?      Ssl  17:05   0:00 
/usr/bin/julia -Cx86-64 -J/usr/lib/x86_64-linux-gnu/julia/sys.so --bind-to 
130.208.69.54 --worker

that you get one worker, on top of the one master, but is it mostly a 
waste? Should it say "must be an integer >= 2 and less than number of 
virtual processors"? Does proc=1 ever make sense? Is it for testing or 
should it maybe do the same as if proc is skipped (1 CPU vs 1+1)?

Is this 8616920 memory use (1.4% on my 8 GB) about a constant that can't be 
reduced much? It would mean that a low end Android phone (512 MB) would max 
out at 4.4 cores, if that (as the system must use something and you have 
zram "compressed VM" (and no "real VM")) and crash with proc=5, maybe 4 or 
lower.


Note I tested all with virtual memory off (and then also on) as I have lots 
of VM (on an SSD), maybe too much (with some of the swap used). I did not 
expect Julia to use a constant amount of memory multiplied by --proc 
because of copy-on-write (COW) but that is in fact what happens. COW 
probably doesn't help on fork, as most of the work Julia does is considered 
data not (only) code..?

Compared to:

$ time julia -e ""

real 0m0.137s
user 0m0.117s
sys 0m0.038s

$ time julia --proc=1 -e ""

real 0m1.646s
user 0m2.302s
sys 0m0.213s


$ time julia --proc=2 -e ""

real 0m2.307s
user 0m4.475s
sys 0m0.317s


$ time julia --proc=3 -e ""

real 0m2.509s
user 0m6.118s
sys 0m0.437s


$ time julia --proc=4 -e ""

real 0m2.608s 
user 0m7.845s
sys 0m0.502s



$ time julia --proc=8 -e ""

real 0m3.003s
user 0m15.457s
sys 0m0.824s



$ time julia --proc=80 -e ""

real 0m26.826s
user 2m15.768s
sys 0m8.688s



I also tested with VM on:

top - 15:19:51 up 12 days, 22:55, 12 users, load average: 0,47, 3,35, 3,79
Tasks: 267 total, 1 running, 266 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu1 : 0,3 us, 0,0 sy, 0,0 ni, 99,7 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu2 : 0,3 us, 0,0 sy, 0,0 ni, 99,7 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu3 : 0,0 us, 0,0 sy, 0,0 ni, 99,3 id, 0,7 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu4 : 0,0 us, 0,3 sy, 0,0 ni, 99,7 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu5 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu6 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu7 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem: 8130224 total, 1201108 used, 6929116 free, 1212 buffers
KiB Swap: 31264764 total, 2585184 used, 28679580 free. 116152 cached Mem


top - 15:01:12 up 12 days, 22:37, 12 users, load average: 1,60, 1,72, 1,54
Tasks: 298 total, 2 running, 296 sleeping, 0 stopped, 0 zombie
%Cpu(s): 16,4 us, 1,0 sy, 0,0 ni, 82,0 id, 0,5 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem: 8130224 total, 7020028 used, 1110196 free, 3116 buffers
KiB Swap: 31264764 total, 1924952 used, 29339812 free. 704908 cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13950 qwerty 20 0 1340360 247648 17256 R 99,8 3,0 418:09.71 chromium-browse
12606 qwerty 20 0 2896688 1,359g 195360 S 8,3 17,5 290:57.88 chromium-browse

-- 
Palli.


Reply via email to