On 22/01/2019 23:01, Makarius wrote:
On 22/01/2019 23:08, Fabian Immler wrote:
On 1/22/2019 4:00 PM, Fabian Immler wrote:
On 1/22/2019 2:28 PM, Makarius wrote:
On 22/01/2019 20:05, Fabian Immler wrote:
These numbers look quite extreme:
The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
it seems to be the case with polyml-test-0a6ebca445fc).

The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

ML_PLATFORM="x86-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
factor 2.66)
Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
factor 1.46) > (I am testing HOL-ODE-ARCH-COMP with
polyml-test-0a6ebca445fc/x86_64_32-linux right now, but that does take
some time...)
HOL-ODE-ARCH-COMP looked fine (2-3 times slower). But I realized that
this was the case with your computations, too. Unlike Lorenz_C0:

x86_64_32-linux -minheap 1500
Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)
x86_64-linux --minheap 3000
Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)
Lorenz_C0 had the most significant slowdown, it has the biggest number
of parallel computations, so I thought this might be related.

And indeed, if you try the theory at the end:
Par_List.map (with 6 cores, on Windows) is 12 times slower on _64_32
whereas the sequential evaluation is only 2 times slower.

All of this reminds me of the discussion we had in November 2017:
https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/2017-November/thread.html#7643

Back in Nov-2017, I made the following workaround that is still active:
http://isabelle.in.tum.de/repos/isabelle/rev/5da20135f560

Looking at the profiles of the included Float_Test.thy I now see a lot
of IntInf.divMod, but it is qualitatively similar to former IntInf.pow.

Maybe David can revisit both of these operations, so that we can  get
rid of the workarounds.

I've had a look at this under Windows and the problem seems to be with calls to IntInf.divMod from generated code, not from IntInf.pow. The underlying RTS call used by IntInf.quotRem has changed in the 32-in-64 version. It previously returned the pair of values by passing in a stack location and having the RTS update this. That isn't possible in the 32-in-64 version so now it allocates a pair on the heap. For simplicity this is now used for the native code versions as well. From what I can tell nearly all the threads are waiting for mutexes and I suspect that the extra heap allocation in IntInf.quotRem is causing some of the problem. Waiting for a contended mutex can cost a significant amount of processor time both in busy-waiting and in kernel calls.

I'm not sure what to suggest except not to use concurrency here. There doesn't seem to be a way to avoid allocating a pair in IntInf.quotRem.

By the way I updated IntInf.pow with
https://github.com/polyml/polyml/commit/c2a296122426f5a6205cf121218e3f38415d2b06

David
_______________________________________________
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev

Reply via email to