Comparing Guile's GC with BDW-GC

Ludovic Courtès Wed, 05 Nov 2008 14:44:03 -0800

Hello!

I finally [0] conducted experiments to compare Guile's GC with my port
of Guile to the Boehm-Demers-Weiser GC (BDW-GC).  The code for that port
is not currently available on-line but I'd be happy to push it somewhere
(would Guile's repo at Savannah be a good fit?).


The following experiments compare Guile 1.9 (commit
9c646eee436c0fa4760962bc0c4070892522eff1) with its default settings to
the corresponding Guile in my BDW-GC branch.  For the BDW-GC-based
Guile, the following parameters are tweaked:

  * the "free space divisor" (FSD), which is described as follows in
    `gc.h':

      "We try to make sure that we allocate at least
       N/GC_free_space_divisor bytes between collections, where N is
       twice the number of traced bytes, plus the number of untraced
       bytes (bytes in "atomic" objects), plus a rough estimate of the
       root set size.  N approximates GC tracing work per GC.
       Initially, GC_free_space_divisor = 3.  Increasing its value will
       use less space but more collection time.  Decreasing it will
       appreciably decrease collection time at the expense of space."

  * whether to use incremental mode or not (on GNU/Linux, by default, it
    uses `mprotect ()' and a SIGSEGV handler to detect dirty pages).

BDW-GC is version 7.1, with its default compile-time settings (i.e., no
additional `configure' flags or so).

In all cases, the "heap size" is estimated by parsing `/proc/self/smaps'
and adding the size of the "[heap]" and "[anon]" mappings, plus the size
of all private anonymous mappings.

However, this method is quite unreliable in the presence of additional
threads since it takes into account the thread stacks and thread-local
storage [1].  Because of this, the estimate is off when using parallel
marking in BDW-GC, so I did not measure it, although it would have been
interesting (if you have an idea on how to fix this, I'm interested in
it!).

This all run on a Intel Core 2 Duo (i686) at 1.2 GHz.


The Results
-----------

There are two benchmarks where BDW-GC is both faster and less
memory-hungry than Guile's GC (marked with an `!'):

  benchmark: `./gc-benchmarks/gcbench.scm' [2]
                       heap size (MiB) execution time (s.)
  Guile                30.40 (1.00x)     50.856 (1.00x)
  BDW-GC, FSD=3        29.46 (0.97x)     35.706 (0.70x) !
  BDW-GC, FSD=6        28.66 (0.94x)     36.448 (0.72x) !
  BDW-GC, FSD=9        28.57 (0.94x)     36.389 (0.72x) !
  BDW-GC, FSD=3 incr.  44.68 (1.47x)     41.509 (0.82x)

  benchmark: `./gc-benchmarks/string.scm' [3]
                       heap size (MiB) execution time (s.)
  Guile                756.66 (1.00x)      9.244 (1.00x)
  BDW-GC, FSD=3        429.91 (0.57x)      6.646 (0.72x) !
  BDW-GC, FSD=6        537.04 (0.71x)      6.769 (0.73x) !
  BDW-GC, FSD=9        433.33 (0.57x)      6.718 (0.73x) !
  BDW-GC, FSD=3 incr.  417.87 (0.55x)      6.975 (0.75x) !

For `string', I expect it to be partly due to the fact that under
unmodified Guile it's libc's malloc that's exercised more than the GC
itself (since stringbuf contents are allocated with `malloc ()').

In most other cases, BDW-GC yields smaller execution times at the
expense of increased heap usage.

  benchmark: `./gc-benchmarks/boyer.scm' [4]
                       heap size (MiB) execution time (s.)
  Guile                1.54 (1.00x)      6.316 (1.00x)
  BDW-GC, FSD=3        2.41 (1.57x)      4.943 (0.78x)
  BDW-GC, FSD=6        2.06 (1.34x)      6.885 (1.09x)
  BDW-GC, FSD=9        1.97 (1.29x)      8.578 (1.36x)
  BDW-GC, FSD=3 incr.  2.31 (1.50x)      4.604 (0.73x)

  benchmark: `./gc-benchmarks/compiler.scm' [4]
                       heap size (MiB) execution time (s.)
  Guile                2.86 (1.00x)      6.803 (1.00x)
  BDW-GC, FSD=3        4.69 (1.64x)      5.079 (0.75x)
  BDW-GC, FSD=6        3.58 (1.25x)      7.018 (1.03x)
  BDW-GC, FSD=9        3.27 (1.14x)      8.687 (1.28x)
  BDW-GC, FSD=3 incr.  3.75 (1.31x)      5.428 (0.80x)

  benchmark: `./gc-benchmarks/fib.scm' [4]
                       heap size (MiB) execution time (s.)
  Guile                1.39 (1.00x)      6.208 (1.00x)
  BDW-GC, FSD=3        2.41 (1.74x)      4.466 (0.72x)
  BDW-GC, FSD=6        2.06 (1.48x)      5.925 (0.95x)
  BDW-GC, FSD=9        1.97 (1.42x)      6.964 (1.12x)
  BDW-GC, FSD=3 incr.  2.38 (1.72x)      4.347 (0.70x)

  benchmark: `./gc-benchmarks/loop.scm' [5]
                       heap size (MiB) execution time (s.)
  Guile                1.39 (1.00x)      8.659 (1.00x)
  BDW-GC, FSD=3        2.41 (1.74x)      6.335 (0.73x)
  BDW-GC, FSD=6        2.06 (1.48x)      8.175 (0.94x)
  BDW-GC, FSD=9        1.97 (1.42x)      9.601 (1.11x)
  BDW-GC, FSD=3 incr.  2.38 (1.72x)      6.211 (0.72x)

  benchmark: `./gc-benchmarks/guile-test.scm' [6]
                       heap size (MiB) execution time (s.)
  Guile                45.06 (1.00x)     18.448 (1.00x)
  BDW-GC, FSD=3        54.31 (1.21x)     17.785 (0.96x)
  BDW-GC, FSD=6        60.45 (1.34x)     16.819 (0.91x)
  BDW-GC, FSD=9        52.76 (1.17x)     17.791 (0.96x)
  BDW-GC, FSD=3 incr.  49.08 (1.09x)     18.852 (1.02x)

In all cases, the `mprotect'-based incremental mode doesn't really help.
I think GC's so-called "manual virtual dirty bits" could make the
incremental mode more useful in our case [7] but I haven't tried yet.


Conclusion (Or Lack Thereof)
----------------------------

The thing is, I don't think there is any "simple" conclusion that can be
drawn from these few benchmarks.  In addition, these are just specific
benchmarks, and performance is certainly highly dependent on the
application and its memory usage patterns.

I think the question of whether BDW-GC is "worth it" becomes more of an
engineering matter if performance is neither problematic not
exceptional.

Such a long mail for just that?!  Yes!  :-)

Thanks,
Ludo'.


[0] It all started some time ago:
    http://thread.gmane.org/gmane.lisp.guile.devel/5946 .

[1] 
http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2395/focus=2464

[2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html

[3] http://www.ccs.neu.edu/home/will/Twobit/KVW/string.txt

[4] These are widespread Scheme benchmarks and can be found, e.g., at
    
https://trac.ccs.neu.edu/trac/larceny/browser/trunk/larceny_src/test/Benchmarking/CrossPlatform/src

[5] "(let loop ((i 10000000)) (and (> i 0) (loop (1- i))))"

[6] Guile's test suite without `popen.test' as it causes termination
    problems.  The heap size estimate is not very reliable here, as
    discussed in [1].

[7] 
http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2324

Comparing Guile's GC with BDW-GC

Reply via email to