Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-28 Thread David Matthews

On 28/01/2019 09:20, Bertram Felgenhauer wrote:

I've made sure that the machine is mostly idle; things will be much
different if several processes start competing for the ressources.

I intended to experiment with smaller numbers but have not yet done so.
On another, similar machine, --gcthreads=2 was just as fast as 6.



It is increasingle clear that the limiting factor is the bandwidth 
between main memory and the processors.  There needs to be sufficient 
parallelism to saturate this channel; more than that will not help and 
may well make things worse.


How many threads are needed to saturate the memory channel depends on 
the processor speed, the memory speed and the application so there's no 
single answer to this.  Garbage collection can be particularly bad. 
Each word in the live data is read in and if it is a pointer it is 
followed.  There's no reuse so caching doesn't help.


The idea behind developing the 32-bit value model for the 64-bit 
processor was that by reducing the size of an ML cell more cells could 
be transferred through the memory channel in a given time.  That seems 
to have been borne out by the results.


David
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-28 Thread Bertram Felgenhauer
Makarius wrote:
> There was indeed something odd with sharing: David Matthews has changed
> that in polyml-1b2dcf8f5202 (see Isabelle/bb0a354f6b46).

I may give it another try later...

> > Hardware:
> >   i7-6850K CPU @ 3.60GHz (6 cores x 2 hyperthreads) / 32GB RAM / SSD
> > 
> > Common build flags:
> >   ISABELLE_BUILD_OPTIONS="threads=12 parallel_proofs=2"
> > 
> > Configurations:
> >   polyml 5.7.1x86_64ML_OPTIONS="--maxheap 24G -H 500 
> > --gcthreads 6"
> >   polyml-a444f281ccec x86_64ML_OPTIONS="--maxheap 24G -H 500 
> > --gcthreads 6"
> 
> For this hardware I recommend threads=6.

In the sessions beyond HOL, I see a speedup from threads=12. The timings
are basically the same with threads=10, but I have not checked the full
range of possibilities.

I've made sure that the machine is mostly idle; things will be much
different if several processes start competing for the ressources.

Configurations:
  threads=6
  threads=10
  threads=12

Times:
  Finished Pure (0:00:13 elapsed time, 0:00:15 cpu time, factor 1.12)
  Finished Pure (0:00:13 elapsed time, 0:00:15 cpu time, factor 1.14)
  Finished Pure (0:00:13 elapsed time, 0:00:15 cpu time, factor 1.14)
  
  Finished HOL (0:02:21 elapsed time, 0:07:47 cpu time, factor 3.30)
  Finished HOL (0:02:25 elapsed time, 0:08:56 cpu time, factor 3.68)
  Finished HOL (0:02:32 elapsed time, 0:09:46 cpu time, factor 3.84)
  
  Finished HOL-Lib (0:05:47 elapsed time, 0:25:18 cpu time, factor 4.37)
  Finished HOL-Lib (0:05:12 elapsed time, 0:32:29 cpu time, factor 6.24)
  Finished HOL-Lib (0:05:10 elapsed time, 0:35:32 cpu time, factor 6.88)
  
  Finished HOL-AFP (0:06:33 elapsed time, 0:27:55 cpu time, factor 4.26)
  Finished HOL-AFP (0:05:51 elapsed time, 0:34:10 cpu time, factor 5.83)
  Finished HOL-AFP (0:05:41 elapsed time, 0:35:44 cpu time, factor 6.28)
  
  Finished IsaFoR_1 (0:05:11 elapsed time, 0:22:30 cpu time, factor 4.34)
  Finished IsaFoR_1 (0:05:04 elapsed time, 0:27:15 cpu time, factor 5.38)
  Finished IsaFoR_1 (0:05:21 elapsed time, 0:29:58 cpu time, factor 5.59)
  
  Finished IsaFoR_2 (0:05:57 elapsed time, 0:21:49 cpu time, factor 3.66)
  Finished IsaFoR_2 (0:05:43 elapsed time, 0:26:24 cpu time, factor 4.61)
  Finished IsaFoR_2 (0:05:40 elapsed time, 0:27:39 cpu time, factor 4.87)
  
  Finished IsaFoR_3 (0:07:13 elapsed time, 0:31:05 cpu time, factor 4.30)
  Finished IsaFoR_3 (0:06:51 elapsed time, 0:40:15 cpu time, factor 5.87)
  Finished IsaFoR_3 (0:06:45 elapsed time, 0:43:32 cpu time, factor 6.45)
  
  Finished IsaFoR_4 (0:08:37 elapsed time, 0:24:58 cpu time, factor 2.90)
  Finished IsaFoR_4 (0:08:48 elapsed time, 0:27:08 cpu time, factor 3.08)
  Finished IsaFoR_4 (0:08:47 elapsed time, 0:27:52 cpu time, factor 3.17)
  
  Finished Code (0:08:04 elapsed time, 0:08:12 cpu time, factor 1.02)
  Finished Code (0:08:00 elapsed time, 0:08:08 cpu time, factor 1.02)
  Finished Code (0:07:59 elapsed time, 0:08:08 cpu time, factor 1.02)

> Moreover note that --gcthreads is automatically provided, if ML_OPTIONS
> does not say anything else.

I intended to experiment with smaller numbers but have not yet done so.
On another, similar machine, --gcthreads=2 was just as fast as 6.

Cheers,

Bertram
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-27 Thread Makarius
On 25/01/2019 20:29, Bertram Felgenhauer wrote:
> 
> - While most heap images are about 40% smaller with x86_64_32, this is
>   not always the case; some heap images ended up being even larger in
>   this experiment. Could there be a problem with the sharing pass on
>   x86_64_32?

There was indeed something odd with sharing: David Matthews has changed
that in polyml-1b2dcf8f5202 (see Isabelle/bb0a354f6b46).


> Hardware:
>   i7-6850K CPU @ 3.60GHz (6 cores x 2 hyperthreads) / 32GB RAM / SSD
> 
> Common build flags:
>   ISABELLE_BUILD_OPTIONS="threads=12 parallel_proofs=2"
> 
> Configurations:
>   polyml 5.7.1x86_64ML_OPTIONS="--maxheap 24G -H 500 --gcthreads 
> 6"
>   polyml-a444f281ccec x86_64ML_OPTIONS="--maxheap 24G -H 500 --gcthreads 
> 6"

For this hardware I recommend threads=6.

Moreover note that --gcthreads is automatically provided, if ML_OPTIONS
does not say anything else.


>   623083120 IsaFoR_2 (1.00)
>   623291272 IsaFoR_2 (1.00)
>   367497971 IsaFoR_2 (0.59)
>   884213127 IsaFoR_2 (1.42) ?

I have briefly tried IsaFoR_2 and now get 624867156 (1.00), which is
better but still slightly odd.


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-25 Thread Bertram Felgenhauer
Bertram Felgenhauer wrote:
> Makarius wrote:
> > So this is the right time for further testing of applications:
> > Isabelle2018 should work as well, but I have not done any testing beyond
> > "isabelle build -g main" -- Isabelle development only moves forward in
> > one direction on a single branch.
>
> I have tried this with Isabelle2018 and IsaFoR; I've encountered no
> problems and there's a nice speedup (estimated 1.25 times faster).
> Heap images are 40% smaller, which is a welcome change as well.

I have gathered some more data (below). There are some curiosities:

- Code export (that's the only thing that the 'Code' session does,
  besides theory merges) seems to be slower with polyml-a444f281ccec
  compared to polyml 5.7.1, even when sticking to the x86_64 platform.

- A similar slowdown seems to affect the IsaFoR_3 session, but I don't
  know what's special about that one.

- While most heap images are about 40% smaller with x86_64_32, this is
  not always the case; some heap images ended up being even larger in
  this experiment. Could there be a problem with the sharing pass on
  x86_64_32?

Cheers,

Bertram

Hardware:
  i7-6850K CPU @ 3.60GHz (6 cores x 2 hyperthreads) / 32GB RAM / SSD

Software:
  GNU/Linux
  Isabelle2018
  afp-2018 @ 400c722462b3
  IsaFoR @ acb9096ce08a

Common build flags:
  ISABELLE_BUILD_OPTIONS="threads=12 parallel_proofs=2"

Configurations:
  polyml 5.7.1x86_64ML_OPTIONS="--maxheap 24G -H 500 --gcthreads 6"
  polyml-a444f281ccec x86_64ML_OPTIONS="--maxheap 24G -H 500 --gcthreads 6"
  polyml-0a6ebca445fc x86_64_32 ML_OPTIONS="--maxheap 16G -H 500 --gcthreads 6"
  polyml-a444f281ccec x86_64_32 ML_OPTIONS="--maxheap 16G -H 500 --gcthreads 6"

Timings:
  Finished Pure (0:00:11 elapsed time, 0:00:14 cpu time, factor 1.20)
  Finished Pure (0:00:15 elapsed time, 0:00:17 cpu time, factor 1.18)
  Finished Pure (0:00:13 elapsed time, 0:00:15 cpu time, factor 1.14)
  Finished Pure (0:00:13 elapsed time, 0:00:15 cpu time, factor 1.13)

  Finished HOL (0:02:46 elapsed time, 0:10:50 cpu time, factor 3.91)
  Finished HOL (0:02:41 elapsed time, 0:10:05 cpu time, factor 3.75)
  Finished HOL (0:02:32 elapsed time, 0:09:46 cpu time, factor 3.84)
  Finished HOL (0:02:28 elapsed time, 0:09:20 cpu time, factor 3.78)

  Finished HOL-Lib (0:05:00 elapsed time, 0:35:54 cpu time, factor 7.18)
  Finished HOL-Lib (0:05:19 elapsed time, 0:36:41 cpu time, factor 6.88)
  Finished HOL-Lib (0:05:10 elapsed time, 0:35:32 cpu time, factor 6.88)
  Finished HOL-Lib (0:04:57 elapsed time, 0:34:16 cpu time, factor 6.91)

  Finished HOL-AFP (0:05:59 elapsed time, 0:37:39 cpu time, factor 6.28)
  Finished HOL-AFP (0:06:15 elapsed time, 0:38:31 cpu time, factor 6.15)
  Finished HOL-AFP (0:05:41 elapsed time, 0:35:44 cpu time, factor 6.28)
  Finished HOL-AFP (0:05:33 elapsed time, 0:35:01 cpu time, factor 6.30)

  Finished IsaFoR_1 (0:05:46 elapsed time, 0:31:43 cpu time, factor 5.49)
  Finished IsaFoR_1 (0:05:42 elapsed time, 0:31:06 cpu time, factor 5.46)
  Finished IsaFoR_1 (0:05:21 elapsed time, 0:29:58 cpu time, factor 5.59)
  Finished IsaFoR_1 (0:05:17 elapsed time, 0:29:15 cpu time, factor 5.54)

  Finished IsaFoR_2 (0:06:07 elapsed time, 0:29:37 cpu time, factor 4.84)
  Finished IsaFoR_2 (0:06:04 elapsed time, 0:29:15 cpu time, factor 4.82)
  Finished IsaFoR_2 (0:05:40 elapsed time, 0:27:39 cpu time, factor 4.87)
  Finished IsaFoR_2 (0:05:59 elapsed time, 0:28:14 cpu time, factor 4.72)

  Finished IsaFoR_3 (0:07:30 elapsed time, 0:47:56 cpu time, factor 6.38)
  Finished IsaFoR_3 (0:09:58 elapsed time, 0:55:27 cpu time, factor 5.56) ?
  Finished IsaFoR_3 (0:06:45 elapsed time, 0:43:32 cpu time, factor 6.45)
  Finished IsaFoR_3 (0:06:49 elapsed time, 0:43:20 cpu time, factor 6.35)

  Finished IsaFoR_4 (0:09:40 elapsed time, 0:30:25 cpu time, factor 3.15)
  Finished IsaFoR_4 (0:09:14 elapsed time, 0:29:16 cpu time, factor 3.17)
  Finished IsaFoR_4 (0:08:47 elapsed time, 0:27:52 cpu time, factor 3.17)
  Finished IsaFoR_4 (0:08:52 elapsed time, 0:27:57 cpu time, factor 3.15)

  Finished Code (0:05:30 elapsed time, 0:05:42 cpu time, factor 1.03)
  Finished Code (0:08:34 elapsed time, 0:08:46 cpu time, factor 1.02) ?
  Finished Code (0:07:59 elapsed time, 0:08:08 cpu time, factor 1.02)
  Finished Code (0:08:00 elapsed time, 0:08:08 cpu time, factor 1.02)

  total 00:48:29
  total 00:54:02
  total 00:48:08
  total 00:48:08

Heap image sizes (factor):
   28081037 Pure (1.00)
   25131549 Pure (0.89)
   19271077 Pure (0.68)
   19270685 Pure (0.68)

  410972244 HOL (1.00)
  403934084 HOL (0.98)
  249583659 HOL (0.61)
  249808871 HOL (0.61)

  525949867 HOL-Lib (1.00)
  526132427 HOL-Lib (1.00)
  308088218 HOL-Lib (0.59)
  307456194 HOL-Lib (0.58)

  913422879 HOL-AFP (1.00)
  915450415 HOL-AFP (1.00)
  537947934 HOL-AFP (0.59)
  538445922 HOL-AFP (0.59)

  724672759 IsaFoR_1 (1.00)
  724822231 IsaFoR_1 (1.00)
  428960922 IsaFoR_1 (0.59)
  429004970 IsaFoR_1 (0.59)

  623083120 IsaFoR_2 (1.00)
  

Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-25 Thread Fabian Immler

Great, thanks for looking into this!
Fabian

On 1/23/2019 4:24 PM, David Matthews wrote:

On 23/01/2019 21:08, Makarius wrote:

On 23/01/2019 12:05, David Matthews wrote:


I've had a look at this under Windows and the problem seems to be with
calls to IntInf.divMod from generated code, not from IntInf.pow.  The
underlying RTS call used by IntInf.quotRem has changed in the 32-in-64
version.  It previously returned the pair of values by passing in a
stack location and having the RTS update this.  That isn't possible in
the 32-in-64 version so now it allocates a pair on the heap.  For
simplicity this is now used for the native code versions as well.  From
what I can tell nearly all the threads are waiting for mutexes and I
suspect that the extra heap allocation in IntInf.quotRem is causing some
of the problem.  Waiting for a contended mutex can cost a significant
amount of processor time both in busy-waiting and in kernel calls.

I'm not sure what to suggest except not to use concurrency here.  There
doesn't seem to be a way to avoid allocating a pair in IntInf.quotRem.


In the meantime David has produced Poly/ML a444f281ccec and I can
confirm that it works very well:


I had another look and it was a mutex contention problem, just not 
exactly where I'd thought.  Most calls to the run-time system involved 
taking a lock for a very short period in order to get the thread's C++ 
data structure.  This wasn't a problem in the vast majority of cases but 
this example involves very many calls to the long-format arbitrary 
precision package.  That resulted in contention on the lock.  I've 
changed this so the lock is no longer needed and it seems to have solved 
the problem.


David




smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing (polyml-test-a444f281ccec)

2019-01-25 Thread Makarius
On 23/01/2019 23:44, Makarius wrote:
> Isabelle/20bc1d26c932 now provides an updated polyml-test-a444f281ccec
> (active by default).
> 
> It performs slightly better than the previous test version -- I have
> also removed old workarounds for integer arithmetic in
> Isabelle/4591221824f6.

One more performance result in this respect:

Isabelle/4591221824f6 + AFP/2eacf2ed7d5d

x86_64_32-linux --minheap 1500 threads=8

Finished Flyspeck-Tame (3:37:32 elapsed time, 5:32:38 cpu time, factor 1.53)

This is a about two times better than what we were used to. It is a
combination of high-end software (current Poly/ML) with high-end
hardware (Intel Xeon Gold 6148 CPU @ 2.40GHz, 20 CPU cores, 64 GB
memory, SSD).


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-25 Thread Tobias Nipkow
This is really phantastic - at last I can build HOL-Analysis on my terrible new 
Mac without having to put it in the freezer and it does not fall over at the 
very end. It is also 20% faster.


Great work!
Tobias

On 19/01/2019 21:43, Makarius wrote:

Thanks to great work by David Matthews, there is now an Isabelle
component polyml-test-0a6ebca445fc (Isabelle/3b777286c3ec), which can be
enabled manually, e.g. in $ISABELLE_HOME_USER/etc/settings like this:

   init_component "$HOME/.isabelle/contrib/polyml-test-0a6ebca445fc"

It requires the usual "isabelle components -a" incantation afterwards.


This supports 3 platform variants:

   x86
   x86_64
   x86_64_32

x86 is still there for comparison, but will disappear soon.

x86_64 is the full 64-bit model as before, and subject to the system
option "ML_system_64 = true".

x86_64_32 is the default. It is a synthesis of the x86_64 architecture
(more memory, more registers etc.) with a 32-bit value model for ML.
Thus we get approx. 5 times more memory as in x86, without the penalty
of full 64-bit values.

Moreover, we have a proper "64-bit application" according to Apple (and
Linux distributions): it is getting increasingly hard to run old x86
executables, and soon it might be hardly possible at all. In other
words, Poly/ML is now getting ready for many more years to come.


The above component already works smoothly for all of Isabelle + AFP,
only with spurious drop-outs that can happen rarely. x86_64_32 is
already more stable than x86, which often suffers from out-of-memory
situations.


So this is the right time for further testing of applications:
Isabelle2018 should work as well, but I have not done any testing beyond
"isabelle build -g main" -- Isabelle development only moves forward in
one direction on a single branch.

For really big applications, it might occasionally help to use something
like "--maxheap 8g" in ML_OPTIONS: the implicit maximum is 16g, which is
sometimes too much for many parallel jobs on mid-range hardware. There
are additional memory requirements outside the ML heap, for program code
and stacks.


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev





smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing (polyml-test-a444f281ccec)

2019-01-23 Thread Makarius
Isabelle/20bc1d26c932 now provides an updated polyml-test-a444f281ccec
(active by default).

It performs slightly better than the previous test version -- I have
also removed old workarounds for integer arithmetic in
Isabelle/4591221824f6.


It is important to check that obsolete entries in
$ISASELLE_HOME_USER/etc/settings are cleaned up, such that this greatest
and latest version gets used.


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-23 Thread David Matthews

On 23/01/2019 21:08, Makarius wrote:

On 23/01/2019 12:05, David Matthews wrote:


I've had a look at this under Windows and the problem seems to be with
calls to IntInf.divMod from generated code, not from IntInf.pow.  The
underlying RTS call used by IntInf.quotRem has changed in the 32-in-64
version.  It previously returned the pair of values by passing in a
stack location and having the RTS update this.  That isn't possible in
the 32-in-64 version so now it allocates a pair on the heap.  For
simplicity this is now used for the native code versions as well.  From
what I can tell nearly all the threads are waiting for mutexes and I
suspect that the extra heap allocation in IntInf.quotRem is causing some
of the problem.  Waiting for a contended mutex can cost a significant
amount of processor time both in busy-waiting and in kernel calls.

I'm not sure what to suggest except not to use concurrency here.  There
doesn't seem to be a way to avoid allocating a pair in IntInf.quotRem.


In the meantime David has produced Poly/ML a444f281ccec and I can
confirm that it works very well:


I had another look and it was a mutex contention problem, just not 
exactly where I'd thought.  Most calls to the run-time system involved 
taking a lock for a very short period in order to get the thread's C++ 
data structure.  This wasn't a problem in the vast majority of cases but 
this example involves very many calls to the long-format arbitrary 
precision package.  That resulted in contention on the lock.  I've 
changed this so the lock is no longer needed and it seems to have solved 
the problem.


David
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-23 Thread Makarius
On 23/01/2019 12:05, David Matthews wrote:
> 
> I've had a look at this under Windows and the problem seems to be with
> calls to IntInf.divMod from generated code, not from IntInf.pow.  The
> underlying RTS call used by IntInf.quotRem has changed in the 32-in-64
> version.  It previously returned the pair of values by passing in a
> stack location and having the RTS update this.  That isn't possible in
> the 32-in-64 version so now it allocates a pair on the heap.  For
> simplicity this is now used for the native code versions as well.  From
> what I can tell nearly all the threads are waiting for mutexes and I
> suspect that the extra heap allocation in IntInf.quotRem is causing some
> of the problem.  Waiting for a contended mutex can cost a significant
> amount of processor time both in busy-waiting and in kernel calls.
> 
> I'm not sure what to suggest except not to use concurrency here.  There
> doesn't seem to be a way to avoid allocating a pair in IntInf.quotRem.

In the meantime David has produced Poly/ML a444f281ccec and I can
confirm that it works very well:

Isabelle/2444c8b85aac
AFP/2eacf2ed7d5d

x86_64_32-linux --minheap 1500 threads=8
Finished HOL-ODE-Numerics (0:17:18 elapsed time, 0:45:28 cpu time,
factor 2.63)
Finished Lorenz_C0 (0:12:06 elapsed time, 1:29:19 cpu time, factor 7.37)

x86_64-linux --minheap 1500 threads=8
Finished HOL-ODE-Numerics (0:19:19 elapsed time, 0:49:46 cpu time,
factor 2.58)
Finished Lorenz_C0 (0:06:54 elapsed time, 0:50:34 cpu time, factor 7.33)


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-23 Thread Makarius
On 23/01/2019 14:14, Bertram Felgenhauer wrote:
> Makarius wrote:
>> On 22/01/2019 12:31, Bertram Felgenhauer wrote:
>>> Makarius wrote:
 So this is the right time for further testing of applications:
 Isabelle2018 should work as well, but I have not done any testing beyond
 "isabelle build -g main" -- Isabelle development only moves forward in
 one direction on a single branch.
>>>
>>> I have tried this with Isabelle2018 and IsaFoR; I've encountered no
>>> problems and there's a nice speedup (estimated 1.25 times faster).
>>> Heap images are 40% smaller, which is a welcome change as well.
>>
>> Is that compared to x86_64 (full 64-bit) or x86 (old 32-bit)?
> 
> This is compared to x86_64. Sorry, I should have mentioned this,
> but to my mind it was implied because IsaFoR is notorious for running
> out of memory with the x86 version of PolyML.

OK, this is the kind of applications that x86_64_32 has been made for:
less memory requirements (< 16 GB) and much faster within it.


>> I am asking this, because I have noted a speedup of building heap
>> images: x86_64_32 compared to x86, and was wondering about the reasons
>> for it. (For x86_64 everything is just more bulky, of course, including
>> heaps.)
> 
> As far as I can see, one difference between x86 and x86_64_32 is that
> PolyML keeps heap objects aligned to 8 byte boundaries for the latter.
> This may impact performance.

I misinterpreted my earlier measurements: the test version is actually a
bit slower in dumping heap images, but x86 is worse than x86_64_32 in
this respect. Something to be investigated further ...


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-23 Thread Bertram Felgenhauer
Makarius wrote:
> On 22/01/2019 12:31, Bertram Felgenhauer wrote:
> > Makarius wrote:
> >> So this is the right time for further testing of applications:
> >> Isabelle2018 should work as well, but I have not done any testing beyond
> >> "isabelle build -g main" -- Isabelle development only moves forward in
> >> one direction on a single branch.
> > 
> > I have tried this with Isabelle2018 and IsaFoR; I've encountered no
> > problems and there's a nice speedup (estimated 1.25 times faster).
> > Heap images are 40% smaller, which is a welcome change as well.
> 
> Is that compared to x86_64 (full 64-bit) or x86 (old 32-bit)?

This is compared to x86_64. Sorry, I should have mentioned this,
but to my mind it was implied because IsaFoR is notorious for running
out of memory with the x86 version of PolyML.

> I am asking this, because I have noted a speedup of building heap
> images: x86_64_32 compared to x86, and was wondering about the reasons
> for it. (For x86_64 everything is just more bulky, of course, including
> heaps.)

As far as I can see, one difference between x86 and x86_64_32 is that
PolyML keeps heap objects aligned to 8 byte boundaries for the latter.
This may impact performance.

Cheers,

Bertram
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-23 Thread David Matthews

On 22/01/2019 23:01, Makarius wrote:

On 22/01/2019 23:08, Fabian Immler wrote:

On 1/22/2019 4:00 PM, Fabian Immler wrote:

On 1/22/2019 2:28 PM, Makarius wrote:

On 22/01/2019 20:05, Fabian Immler wrote:

These numbers look quite extreme:
The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
it seems to be the case with polyml-test-0a6ebca445fc).

The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

ML_PLATFORM="x86-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
factor 2.66)
Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
factor 1.46) > (I am testing HOL-ODE-ARCH-COMP with

polyml-test-0a6ebca445fc/x86_64_32-linux right now, but that does take
some time...)

HOL-ODE-ARCH-COMP looked fine (2-3 times slower). But I realized that
this was the case with your computations, too. Unlike Lorenz_C0:


x86_64_32-linux -minheap 1500
Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)
x86_64-linux --minheap 3000
Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)

Lorenz_C0 had the most significant slowdown, it has the biggest number
of parallel computations, so I thought this might be related.

And indeed, if you try the theory at the end:
Par_List.map (with 6 cores, on Windows) is 12 times slower on _64_32
whereas the sequential evaluation is only 2 times slower.

All of this reminds me of the discussion we had in November 2017:
https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/2017-November/thread.html#7643


Back in Nov-2017, I made the following workaround that is still active:
http://isabelle.in.tum.de/repos/isabelle/rev/5da20135f560

Looking at the profiles of the included Float_Test.thy I now see a lot
of IntInf.divMod, but it is qualitatively similar to former IntInf.pow.

Maybe David can revisit both of these operations, so that we can  get
rid of the workarounds.


I've had a look at this under Windows and the problem seems to be with 
calls to IntInf.divMod from generated code, not from IntInf.pow.  The 
underlying RTS call used by IntInf.quotRem has changed in the 32-in-64 
version.  It previously returned the pair of values by passing in a 
stack location and having the RTS update this.  That isn't possible in 
the 32-in-64 version so now it allocates a pair on the heap.  For 
simplicity this is now used for the native code versions as well.  From 
what I can tell nearly all the threads are waiting for mutexes and I 
suspect that the extra heap allocation in IntInf.quotRem is causing some 
of the problem.  Waiting for a contended mutex can cost a significant 
amount of processor time both in busy-waiting and in kernel calls.


I'm not sure what to suggest except not to use concurrency here.  There 
doesn't seem to be a way to avoid allocating a pair in IntInf.quotRem.


By the way I updated IntInf.pow with
https://github.com/polyml/polyml/commit/c2a296122426f5a6205cf121218e3f38415d2b06

David
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Makarius
On 22/01/2019 23:08, Fabian Immler wrote:
> On 1/22/2019 4:00 PM, Fabian Immler wrote:
>> On 1/22/2019 2:28 PM, Makarius wrote:
>>> On 22/01/2019 20:05, Fabian Immler wrote:
 These numbers look quite extreme:
 The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
 it seems to be the case with polyml-test-0a6ebca445fc).

 The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

 ML_PLATFORM="x86-linux"
 ML_OPTIONS="--minheap 2000 --maxheap 4000"
 Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
 factor 2.66)
 Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
 factor 1.46) > (I am testing HOL-ODE-ARCH-COMP with
>> polyml-test-0a6ebca445fc/x86_64_32-linux right now, but that does take
>> some time...)
> HOL-ODE-ARCH-COMP looked fine (2-3 times slower). But I realized that
> this was the case with your computations, too. Unlike Lorenz_C0:
> 
>> x86_64_32-linux -minheap 1500
>> Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)
>> x86_64-linux --minheap 3000
>> Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)
> Lorenz_C0 had the most significant slowdown, it has the biggest number
> of parallel computations, so I thought this might be related.
> 
> And indeed, if you try the theory at the end:
> Par_List.map (with 6 cores, on Windows) is 12 times slower on _64_32
> whereas the sequential evaluation is only 2 times slower.
> 
> All of this reminds me of the discussion we had in November 2017:
> https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/2017-November/thread.html#7643

Back in Nov-2017, I made the following workaround that is still active:
http://isabelle.in.tum.de/repos/isabelle/rev/5da20135f560

Looking at the profiles of the included Float_Test.thy I now see a lot
of IntInf.divMod, but it is qualitatively similar to former IntInf.pow.

Maybe David can revisit both of these operations, so that we can  get
rid of the workarounds.


Note that I have produced the profiles as follows:

  isabelle build -o profiling=time ...

with a suitable test session that includes the above test theory, e.g.
see the included ROOT.

Then with "isabelle profiling_report" on the resulting log files, e.g.

  isabelle profiling_report
~/.isabelle/heaps/polyml-5.7.1_x86-linux/log/Float_Test.gz


Makarius
theory Float_Test
  imports
"HOL-Library.Float"
"HOL-Library.Code_Target_Numeral"
begin

definition "logistic p r x =
  normfloat (float_round_down p (float_round_down p (r * x)) * (float_plus_down 
p 1 (-x)))"

primrec iter where
  "iter p r x 0 = x"
| "iter p r x (Suc n) = iter p r (logistic p r x) n"

definition "logistic_chaos i = iter 30 (Float 15 (-4)) (Float 1 (-1)) 
(nat_of_integer i)"

ML \val logistic_chaos = @{code logistic_chaos}\
ML \Par_List.map logistic_chaos (replicate 10 10)\
\ \x86_64_32: 24s
   x86_64: 2s\

(*
ML \map logistic_chaos (replicate 10 10)\
\ \x86_64_32: 16s
   x86_64: 8s\
*)

endsession Float_Test = "HOL-Library" +
  theories
Float_Test
Float_Test:
 1 eq-xsymb
 1 Term_Ord.typ_ord
 1 Raw_Simplifier.extract_rews
 1 Output_Primitives.ignore_outputs
 1 Code_Symbol.symbol_ord
 1 Proofterm.thm_ord
 1 Multithreading.sync_wait
 1 Graph().merge_trans_acyclic
 1 Scan.many
 1 Basics.fold_map
 1 Basics.fold
 1 Pretty.string
 1 Type_Infer_Context.prepare_term
 1 eq-xprod
 1 Term.add_tvarsT
 1 Print_Mode.print_mode_value
 1 X86ICodeToX86Code().icodeToX86Code
 1 ProofRewriteRules.rew
 1 String.fields
 1 List.foldr
 1 Library.insert
 1 Type_Infer_Context.unify
 1 Term.fold_aterms
 1 Term.fastype_of1
 1 Raw_Simplifier.bottomc
 1 Term_Ord.fast_indexname_ord
 1 Term_Subst.map_aterms_same
 1 Type_Infer.add_parms
 1 Axclass.unoverload
 1 CODETREE_REMOVE_REDUNDANT().cleanProc
 1 X86ICodeIdentifyReferences().getInstructionState
 1 IntSet.minusLists
 1 Term_Subst.map_types_same
 1 Symbol.explode
 1 Graph().add_edge
 1 TYPE_TREE().eventual
 1 Raw_Simplifier.rewrite_rule_extra_vars
 1 Path.check_elem
 1 Lazy.force_result
 1 Raw_Simplifier.insert_rrule
 1 Term.map_types
 1 Induct().concl_var
 2 Generic_Data().get
 2 Logic.list_implies
 2 
 2 Table().join
 2 IntSet.partition
 2 Thm.deriv_rule2
 2 X86CodetreeToICode().codeFunctionToX86
 2 Thm.transfer
 2 Term.fold_term_types
 2 IntSet.mergeLists
 2 

Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Fabian Immler

On 1/22/2019 4:00 PM, Fabian Immler wrote:

On 1/22/2019 2:28 PM, Makarius wrote:

On 22/01/2019 20:05, Fabian Immler wrote:

These numbers look quite extreme:
The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
it seems to be the case with polyml-test-0a6ebca445fc).

The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

ML_PLATFORM="x86-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
factor 2.66)
Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
factor 1.46) > (I am testing HOL-ODE-ARCH-COMP with
polyml-test-0a6ebca445fc/x86_64_32-linux right now, but that does take 
some time...)
HOL-ODE-ARCH-COMP looked fine (2-3 times slower). But I realized that 
this was the case with your computations, too. Unlike Lorenz_C0:


> x86_64_32-linux -minheap 1500
> Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)
> x86_64-linux --minheap 3000
> Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)
Lorenz_C0 had the most significant slowdown, it has the biggest number 
of parallel computations, so I thought this might be related.


And indeed, if you try the theory at the end:
Par_List.map (with 6 cores, on Windows) is 12 times slower on _64_32 
whereas the sequential evaluation is only 2 times slower.


All of this reminds me of the discussion we had in November 2017:
https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/2017-November/thread.html#7643

Fabian


==
theory Scratch
  imports
"HOL-Library.Float"
"HOL-Library.Code_Target_Numeral"
begin

definition "logistic p r x =
  normfloat (float_round_down p (float_round_down p (r * x)) * 
(float_plus_down p 1 (-x)))"


primrec iter where
  "iter p r x 0 = x"
| "iter p r x (Suc n) = iter p r (logistic p r x) n"

definition "logistic_chaos i = iter 30 (Float 15 (-4)) (Float 1 (-1)) 
(nat_of_integer i)"


ML ‹val logistic_chaos = @{code logistic_chaos}›
ML ‹Par_List.map logistic_chaos (replicate 10 10)›
― ‹x86_64_32: 24s
   x86_64: 2s›
ML ‹map logistic_chaos (replicate 10 10)›
― ‹x86_64_32: 16s
   x86_64: 8s›

end



smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Fabian Immler

On 1/22/2019 2:28 PM, Makarius wrote:

On 22/01/2019 20:05, Fabian Immler wrote:

These numbers look quite extreme:
The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
it seems to be the case with polyml-test-0a6ebca445fc).

The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

ML_PLATFORM="x86-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
factor 2.66)
Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
factor 1.46)


Can you point to some smaller parts of these sessions, where the effect
is visible? We can then use profiling to get an idea what actually
happens in x86_64_32.
It should be the by (tactic ...) invocations, which ultimately run 
generated code as an oracle (the one defined via @{computation_check} 
here: 
https://bitbucket.org/isa-afp/afp-devel/src/5d11846ac6abdad9c9dfee108d2772ac71c6179c/thys/Ordinary_Differential_Equations/Numerics/Example_Utilities.thy#lines-2449)


The term that is being evaluated should be printed when declaring
[[ode_numerics_trace]].
(But it takes a lot of time to get there...)

I would have assumed that it is the heavy use of int computations that 
makes the difference, and it should be precisely the kind that is tested 
in the attached Float_Test.thy.


On my Windows-Laptop and on lxcisa0, however, I see the expected 
slowdown of about a factor of 2, but not more...


Could the issue be related to specific machines?
(I am testing HOL-ODE-ARCH-COMP with 
polyml-test-0a6ebca445fc/x86_64_32-linux right now, but that does take 
some time...)


Fabian


theory Float_Test
  imports
"HOL-Library.Float"
"HOL-Library.Code_Target_Numeral"
begin

definition "logistic p r x =
  normfloat (float_round_down p (float_round_down p (r * x)) * (float_plus_down 
p 1 (-x)))"

primrec iter where
  "iter p r x 0 = x"
| "iter p r x (Suc n) = iter p r (logistic p r x) n"

definition "logistic_chaos i = iter 30 (Float 15 (-4)) (Float 1 (-1)) 
(nat_of_integer i)"

ML \val logistic_chaos = @{code logistic_chaos}\
ML \logistic_chaos 100\

end

smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Makarius
On 22/01/2019 20:05, Fabian Immler wrote:
> These numbers look quite extreme:
> The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as
> it seems to be the case with polyml-test-0a6ebca445fc).
> 
> The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:
> 
> ML_PLATFORM="x86-linux"
> ML_OPTIONS="--minheap 2000 --maxheap 4000"
> Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time,
> factor 2.66)
> Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time,
> factor 1.46)

Can you point to some smaller parts of these sessions, where the effect
is visible? We can then use profiling to get an idea what actually
happens in x86_64_32.


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Fabian Immler

These numbers look quite extreme:
The slowdown in polyml-5.7.1-8 is about a factor of 2-3 (and not 30 as 
it seems to be the case with polyml-test-0a6ebca445fc).


The following is isabelle/ab5a8a2519b0 afp/f49f0b0ead38 on lxcisa0:

ML_PLATFORM="x86-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:25:27 elapsed time, 1:07:49 cpu time, 
factor 2.66)
Finished HOL-ODE-ARCH-COMP (0:18:01 elapsed time, 0:26:19 cpu time, 
factor 1.46)



ML_PLATFORM="x86_64-linux"
ML_OPTIONS="--minheap 2000 --maxheap 4000"
Finished HOL-ODE-Numerics (0:29:34 elapsed time, 1:21:08 cpu time, 
factor 2.74)
Finished HOL-ODE-ARCH-COMP (0:06:49 elapsed time, 0:12:41 cpu time, 
factor 1.86)


Fabian

On 1/22/2019 11:36 AM, Makarius wrote:

On 19/01/2019 21:43, Makarius wrote:

Thanks to great work by David Matthews, there is now an Isabelle
component polyml-test-0a6ebca445fc (Isabelle/3b777286c3ec), which can be
enabled manually, e.g. in $ISABELLE_HOME_USER/etc/settings like this:

   init_component "$HOME/.isabelle/contrib/polyml-test-0a6ebca445fc"

It requires the usual "isabelle components -a" incantation afterwards.


polyml-test-0a6ebca445fc is the default in 81ca77cb7c8c, it means the
scope of further testing has widened.

Almost everything has become faster by default, but an exception are
heavy-duty int computations that rely on a certain word size, notably
the HOL-ODE family of sessions.

AFP/a04825886e71 marks various sessions explicitly as "large", which
means that they prefer x86_64.

Here are concrete numbers:

x86_64_32-linux -minheap 1500
Finished Pure (0:00:15 elapsed time, 0:00:15 cpu time, factor 1.00)
Finished HOL (0:02:28 elapsed time, 0:08:32 cpu time, factor 3.45)
Finished HOL-Analysis (0:03:48 elapsed time, 0:21:27 cpu time, factor 5.64)
Finished Ordinary_Differential_Equations (0:01:14 elapsed time, 0:05:31
cpu time, factor 4.43)
Finished Differential_Dynamic_Logic (0:01:29 elapsed time, 0:05:03 cpu
time, factor 3.39)
Finished HOL-ODE-Numerics (0:17:51 elapsed time, 0:46:30 cpu time,
factor 2.60)
Finished Lorenz_Approximation (0:04:02 elapsed time, 0:07:46 cpu time,
factor 1.92)
Finished Lorenz_C1 (0:00:04 elapsed time, 0:00:05 cpu time, factor 1.10)
Finished HOL-ODE-ARCH-COMP (0:12:56 elapsed time, 0:28:35 cpu time,
factor 2.21)
Finished HOL-ODE-Examples (0:37:13 elapsed time, 2:51:00 cpu time,
factor 4.59)
Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)

x86_64-linux --minheap 3000
Finished Pure (0:00:16 elapsed time, 0:00:16 cpu time, factor 1.00)
Finished HOL (0:02:44 elapsed time, 0:09:28 cpu time, factor 3.47)
Finished HOL-Analysis (0:04:05 elapsed time, 0:22:47 cpu time, factor 5.58)
Finished Ordinary_Differential_Equations (0:01:19 elapsed time, 0:05:49
cpu time, factor 4.40)
Finished Differential_Dynamic_Logic (0:01:33 elapsed time, 0:05:22 cpu
time, factor 3.44)
Finished HOL-ODE-Numerics (0:18:59 elapsed time, 0:49:00 cpu time,
factor 2.58)
Finished Lorenz_Approximation (0:04:01 elapsed time, 0:07:39 cpu time,
factor 1.90)
Finished Lorenz_C1 (0:00:03 elapsed time, 0:00:03 cpu time, factor 1.20)
Finished HOL-ODE-ARCH-COMP (0:04:06 elapsed time, 0:08:37 cpu time,
factor 2.10)
Finished HOL-ODE-Examples (0:05:18 elapsed time, 0:17:04 cpu time,
factor 3.22)
Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev





smime.p7s
Description: S/MIME Cryptographic Signature
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Makarius
On 19/01/2019 21:43, Makarius wrote:
> Thanks to great work by David Matthews, there is now an Isabelle
> component polyml-test-0a6ebca445fc (Isabelle/3b777286c3ec), which can be
> enabled manually, e.g. in $ISABELLE_HOME_USER/etc/settings like this:
> 
>   init_component "$HOME/.isabelle/contrib/polyml-test-0a6ebca445fc"
> 
> It requires the usual "isabelle components -a" incantation afterwards.

polyml-test-0a6ebca445fc is the default in 81ca77cb7c8c, it means the
scope of further testing has widened.

Almost everything has become faster by default, but an exception are
heavy-duty int computations that rely on a certain word size, notably
the HOL-ODE family of sessions.

AFP/a04825886e71 marks various sessions explicitly as "large", which
means that they prefer x86_64.

Here are concrete numbers:

x86_64_32-linux -minheap 1500
Finished Pure (0:00:15 elapsed time, 0:00:15 cpu time, factor 1.00)
Finished HOL (0:02:28 elapsed time, 0:08:32 cpu time, factor 3.45)
Finished HOL-Analysis (0:03:48 elapsed time, 0:21:27 cpu time, factor 5.64)
Finished Ordinary_Differential_Equations (0:01:14 elapsed time, 0:05:31
cpu time, factor 4.43)
Finished Differential_Dynamic_Logic (0:01:29 elapsed time, 0:05:03 cpu
time, factor 3.39)
Finished HOL-ODE-Numerics (0:17:51 elapsed time, 0:46:30 cpu time,
factor 2.60)
Finished Lorenz_Approximation (0:04:02 elapsed time, 0:07:46 cpu time,
factor 1.92)
Finished Lorenz_C1 (0:00:04 elapsed time, 0:00:05 cpu time, factor 1.10)
Finished HOL-ODE-ARCH-COMP (0:12:56 elapsed time, 0:28:35 cpu time,
factor 2.21)
Finished HOL-ODE-Examples (0:37:13 elapsed time, 2:51:00 cpu time,
factor 4.59)
Finished Lorenz_C0 (3:46:03 elapsed time, 28:01:18 cpu time, factor 7.44)

x86_64-linux --minheap 3000
Finished Pure (0:00:16 elapsed time, 0:00:16 cpu time, factor 1.00)
Finished HOL (0:02:44 elapsed time, 0:09:28 cpu time, factor 3.47)
Finished HOL-Analysis (0:04:05 elapsed time, 0:22:47 cpu time, factor 5.58)
Finished Ordinary_Differential_Equations (0:01:19 elapsed time, 0:05:49
cpu time, factor 4.40)
Finished Differential_Dynamic_Logic (0:01:33 elapsed time, 0:05:22 cpu
time, factor 3.44)
Finished HOL-ODE-Numerics (0:18:59 elapsed time, 0:49:00 cpu time,
factor 2.58)
Finished Lorenz_Approximation (0:04:01 elapsed time, 0:07:39 cpu time,
factor 1.90)
Finished Lorenz_C1 (0:00:03 elapsed time, 0:00:03 cpu time, factor 1.20)
Finished HOL-ODE-ARCH-COMP (0:04:06 elapsed time, 0:08:37 cpu time,
factor 2.10)
Finished HOL-ODE-Examples (0:05:18 elapsed time, 0:17:04 cpu time,
factor 3.22)
Finished Lorenz_C0 (0:06:52 elapsed time, 0:51:44 cpu time, factor 7.52)


Makarius
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Makarius
On 22/01/2019 12:31, Bertram Felgenhauer wrote:
> Makarius wrote:
>> So this is the right time for further testing of applications:
>> Isabelle2018 should work as well, but I have not done any testing beyond
>> "isabelle build -g main" -- Isabelle development only moves forward in
>> one direction on a single branch.
> 
> I have tried this with Isabelle2018 and IsaFoR; I've encountered no
> problems and there's a nice speedup (estimated 1.25 times faster).
> Heap images are 40% smaller, which is a welcome change as well.

Is that compared to x86_64 (full 64-bit) or x86 (old 32-bit)?

I am asking this, because I have noted a speedup of building heap
images: x86_64_32 compared to x86, and was wondering about the reasons
for it. (For x86_64 everything is just more bulky, of course, including
heaps.)


Makarius

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Lawrence Paulson
Looks impressive. Thanks!
Larry

> On 22 Jan 2019, at 11:27, Makarius  wrote:
> 
> Here are some performance measurements on the best hardware that I have
> presently access to (not at TUM):

___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Bertram Felgenhauer
Makarius wrote:
> So this is the right time for further testing of applications:
> Isabelle2018 should work as well, but I have not done any testing beyond
> "isabelle build -g main" -- Isabelle development only moves forward in
> one direction on a single branch.

I have tried this with Isabelle2018 and IsaFoR; I've encountered no
problems and there's a nice speedup (estimated 1.25 times faster).
Heap images are 40% smaller, which is a welcome change as well.

Thanks a lot, David and Makarius!

Bertram
___
isabelle-dev mailing list
isabelle-...@in.tum.de
https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev


Re: [isabelle-dev] Poly/ML x86_64_32 available for testing

2019-01-22 Thread Makarius
On 19/01/2019 21:43, Makarius wrote:
> Thanks to great work by David Matthews, there is now an Isabelle
> component polyml-test-0a6ebca445fc (Isabelle/3b777286c3ec), which can be
> enabled manually, e.g. in $ISABELLE_HOME_USER/etc/settings like this:
> 
>   init_component "$HOME/.isabelle/contrib/polyml-test-0a6ebca445fc"
> 
> It requires the usual "isabelle components -a" incantation afterwards.

Here are some performance measurements on the best hardware that I have
presently access to (not at TUM):

  Intel Xeon Gold 6148 CPU @ 2.40GHz
  20 CPU cores * 2 hyperthreads * 2 nodes
  64 GB memory
  SSD


Isabelle/2633e166136a + AFP/ba82c831e5c2
isabelle build -N -j2 -o threads=8

x86-linux --minheap 1500 --maxheap 3500
Finished HOL (0:02:36 elapsed time, 0:08:46 cpu time, factor 3.37)
Finished HOL (0:02:36 elapsed time, 0:08:46 cpu time, factor 3.37)
Finished HOL-Analysis (0:03:49 elapsed time, 0:21:11 cpu time, factor 5.55)
Finished HOL-Analysis (0:03:50 elapsed time, 0:21:19 cpu time, factor 5.55)
Finished HOL-Data_Structures (0:01:39 elapsed time, 0:11:30 cpu time,
factor 6.91)
Finished HOL-Data_Structures (0:01:43 elapsed time, 0:11:52 cpu time,
factor 6.87)
Finished HOL-Nominal (0:00:28 elapsed time, 0:01:07 cpu time, factor 2.34)
Finished HOL-Nominal (0:00:29 elapsed time, 0:01:08 cpu time, factor 2.33)
Finished HOL-Nominal-Examples (0:02:57 elapsed time, 0:14:11 cpu time,
factor 4.80)
Finished HOL-Nominal-Examples (0:03:09 elapsed time, 0:14:55 cpu time,
factor 4.73)
Finished HOL-Proofs (0:09:52 elapsed time, 0:23:45 cpu time, factor 2.41)
Finished HOL-Proofs (0:09:52 elapsed time, 0:24:04 cpu time, factor 2.44)

x86_64_32-linux --minheap 1500 --maxheap 3500
Finished HOL (0:02:32 elapsed time, 0:08:50 cpu time, factor 3.49)
Finished HOL (0:02:32 elapsed time, 0:08:50 cpu time, factor 3.48)
Finished HOL-Analysis (0:03:41 elapsed time, 0:20:56 cpu time, factor 5.67)
Finished HOL-Analysis (0:03:39 elapsed time, 0:20:50 cpu time, factor 5.70)
Finished HOL-Data_Structures (0:01:36 elapsed time, 0:11:33 cpu time,
factor 7.16)
Finished HOL-Data_Structures (0:01:36 elapsed time, 0:11:35 cpu time,
factor 7.20)
Finished HOL-Nominal (0:00:12 elapsed time, 0:00:24 cpu time, factor 2.07)
Finished HOL-Nominal (0:00:11 elapsed time, 0:00:24 cpu time, factor 2.06)
Finished HOL-Nominal-Examples (0:02:53 elapsed time, 0:13:54 cpu time,
factor 4.82)
Finished HOL-Nominal-Examples (0:02:51 elapsed time, 0:13:51 cpu time,
factor 4.83)
Finished HOL-Proofs (0:09:55 elapsed time, 0:23:48 cpu time, factor 2.40)
Finished HOL-Proofs (0:09:48 elapsed time, 0:23:35 cpu time, factor 2.40)

x86_64_32-linux --minheap 3000 --maxheap 7000
Finished HOL (0:02:32 elapsed time, 0:08:53 cpu time, factor 3.50)
Finished HOL (0:02:32 elapsed time, 0:08:51 cpu time, factor 3.49)
Finished HOL-Analysis (0:03:26 elapsed time, 0:20:06 cpu time, factor 5.84)
Finished HOL-Analysis (0:03:45 elapsed time, 0:21:20 cpu time, factor 5.67)
Finished HOL-Data_Structures (0:01:35 elapsed time, 0:11:32 cpu time,
factor 7.26)
Finished HOL-Data_Structures (0:01:36 elapsed time, 0:11:36 cpu time,
factor 7.20)
Finished HOL-Nominal (0:00:12 elapsed time, 0:00:25 cpu time, factor 2.08)
Finished HOL-Nominal (0:00:11 elapsed time, 0:00:25 cpu time, factor 2.10)
Finished HOL-Nominal-Examples (0:02:45 elapsed time, 0:13:26 cpu time,
factor 4.88)
Finished HOL-Nominal-Examples (0:02:45 elapsed time, 0:13:26 cpu time,
factor 4.87)
Finished HOL-Proofs (0:08:10 elapsed time, 0:19:17 cpu time, factor 2.36)
Finished HOL-Proofs (0:08:23 elapsed time, 0:19:13 cpu time, factor 2.29)

x86_64-linux --minheap 1500 --maxheap 7000
Finished HOL (0:02:48 elapsed time, 0:09:46 cpu time, factor 3.48)
Finished HOL (0:02:46 elapsed time, 0:09:34 cpu time, factor 3.45)
Finished HOL-Analysis (0:04:01 elapsed time, 0:22:39 cpu time, factor 5.64)
Finished HOL-Analysis (0:04:00 elapsed time, 0:22:27 cpu time, factor 5.60)
Finished HOL-Data_Structures (0:01:58 elapsed time, 0:12:49 cpu time,
factor 6.47)
Finished HOL-Data_Structures (0:01:58 elapsed time, 0:12:42 cpu time,
factor 6.46)
Finished HOL-Nominal (0:00:13 elapsed time, 0:00:27 cpu time, factor 2.05)
Finished HOL-Nominal (0:00:13 elapsed time, 0:00:27 cpu time, factor 2.04)
Finished HOL-Nominal-Examples (0:03:03 elapsed time, 0:14:39 cpu time,
factor 4.80)
Finished HOL-Nominal-Examples (0:02:58 elapsed time, 0:14:29 cpu time,
factor 4.88)
Finished HOL-Proofs (0:10:16 elapsed time, 0:25:24 cpu time, factor 2.47)
Finished HOL-Proofs (0:10:21 elapsed time, 0:25:35 cpu time, factor 2.47)

x86_64-linux --minheap 3000 --maxheap 14000
Finished HOL (0:02:44 elapsed time, 0:09:31 cpu time, factor 3.46)
Finished HOL (0:02:44 elapsed time, 0:09:31 cpu time, factor 3.47)
Finished HOL-Analysis (0:04:06 elapsed time, 0:22:40 cpu time, factor 5.51)
Finished HOL-Analysis (0:04:06 elapsed time, 0:22:54 cpu time, factor 5.57)
Finished HOL-Data_Structures (0:01:44 elapsed time, 0:12:28 cpu time,
factor 7.14)
Finished HOL-Data_Structures (0:01:43 elapsed