Hello! I finally [0] conducted experiments to compare Guile's GC with my port of Guile to the Boehm-Demers-Weiser GC (BDW-GC). The code for that port is not currently available on-line but I'd be happy to push it somewhere (would Guile's repo at Savannah be a good fit?).
The following experiments compare Guile 1.9 (commit 9c646eee436c0fa4760962bc0c4070892522eff1) with its default settings to the corresponding Guile in my BDW-GC branch. For the BDW-GC-based Guile, the following parameters are tweaked: * the "free space divisor" (FSD), which is described as follows in `gc.h': "We try to make sure that we allocate at least N/GC_free_space_divisor bytes between collections, where N is twice the number of traced bytes, plus the number of untraced bytes (bytes in "atomic" objects), plus a rough estimate of the root set size. N approximates GC tracing work per GC. Initially, GC_free_space_divisor = 3. Increasing its value will use less space but more collection time. Decreasing it will appreciably decrease collection time at the expense of space." * whether to use incremental mode or not (on GNU/Linux, by default, it uses `mprotect ()' and a SIGSEGV handler to detect dirty pages). BDW-GC is version 7.1, with its default compile-time settings (i.e., no additional `configure' flags or so). In all cases, the "heap size" is estimated by parsing `/proc/self/smaps' and adding the size of the "[heap]" and "[anon]" mappings, plus the size of all private anonymous mappings. However, this method is quite unreliable in the presence of additional threads since it takes into account the thread stacks and thread-local storage [1]. Because of this, the estimate is off when using parallel marking in BDW-GC, so I did not measure it, although it would have been interesting (if you have an idea on how to fix this, I'm interested in it!). This all run on a Intel Core 2 Duo (i686) at 1.2 GHz. The Results ----------- There are two benchmarks where BDW-GC is both faster and less memory-hungry than Guile's GC (marked with an `!'): benchmark: `./gc-benchmarks/gcbench.scm' [2] heap size (MiB) execution time (s.) Guile 30.40 (1.00x) 50.856 (1.00x) BDW-GC, FSD=3 29.46 (0.97x) 35.706 (0.70x) ! BDW-GC, FSD=6 28.66 (0.94x) 36.448 (0.72x) ! BDW-GC, FSD=9 28.57 (0.94x) 36.389 (0.72x) ! BDW-GC, FSD=3 incr. 44.68 (1.47x) 41.509 (0.82x) benchmark: `./gc-benchmarks/string.scm' [3] heap size (MiB) execution time (s.) Guile 756.66 (1.00x) 9.244 (1.00x) BDW-GC, FSD=3 429.91 (0.57x) 6.646 (0.72x) ! BDW-GC, FSD=6 537.04 (0.71x) 6.769 (0.73x) ! BDW-GC, FSD=9 433.33 (0.57x) 6.718 (0.73x) ! BDW-GC, FSD=3 incr. 417.87 (0.55x) 6.975 (0.75x) ! For `string', I expect it to be partly due to the fact that under unmodified Guile it's libc's malloc that's exercised more than the GC itself (since stringbuf contents are allocated with `malloc ()'). In most other cases, BDW-GC yields smaller execution times at the expense of increased heap usage. benchmark: `./gc-benchmarks/boyer.scm' [4] heap size (MiB) execution time (s.) Guile 1.54 (1.00x) 6.316 (1.00x) BDW-GC, FSD=3 2.41 (1.57x) 4.943 (0.78x) BDW-GC, FSD=6 2.06 (1.34x) 6.885 (1.09x) BDW-GC, FSD=9 1.97 (1.29x) 8.578 (1.36x) BDW-GC, FSD=3 incr. 2.31 (1.50x) 4.604 (0.73x) benchmark: `./gc-benchmarks/compiler.scm' [4] heap size (MiB) execution time (s.) Guile 2.86 (1.00x) 6.803 (1.00x) BDW-GC, FSD=3 4.69 (1.64x) 5.079 (0.75x) BDW-GC, FSD=6 3.58 (1.25x) 7.018 (1.03x) BDW-GC, FSD=9 3.27 (1.14x) 8.687 (1.28x) BDW-GC, FSD=3 incr. 3.75 (1.31x) 5.428 (0.80x) benchmark: `./gc-benchmarks/fib.scm' [4] heap size (MiB) execution time (s.) Guile 1.39 (1.00x) 6.208 (1.00x) BDW-GC, FSD=3 2.41 (1.74x) 4.466 (0.72x) BDW-GC, FSD=6 2.06 (1.48x) 5.925 (0.95x) BDW-GC, FSD=9 1.97 (1.42x) 6.964 (1.12x) BDW-GC, FSD=3 incr. 2.38 (1.72x) 4.347 (0.70x) benchmark: `./gc-benchmarks/loop.scm' [5] heap size (MiB) execution time (s.) Guile 1.39 (1.00x) 8.659 (1.00x) BDW-GC, FSD=3 2.41 (1.74x) 6.335 (0.73x) BDW-GC, FSD=6 2.06 (1.48x) 8.175 (0.94x) BDW-GC, FSD=9 1.97 (1.42x) 9.601 (1.11x) BDW-GC, FSD=3 incr. 2.38 (1.72x) 6.211 (0.72x) benchmark: `./gc-benchmarks/guile-test.scm' [6] heap size (MiB) execution time (s.) Guile 45.06 (1.00x) 18.448 (1.00x) BDW-GC, FSD=3 54.31 (1.21x) 17.785 (0.96x) BDW-GC, FSD=6 60.45 (1.34x) 16.819 (0.91x) BDW-GC, FSD=9 52.76 (1.17x) 17.791 (0.96x) BDW-GC, FSD=3 incr. 49.08 (1.09x) 18.852 (1.02x) In all cases, the `mprotect'-based incremental mode doesn't really help. I think GC's so-called "manual virtual dirty bits" could make the incremental mode more useful in our case [7] but I haven't tried yet. Conclusion (Or Lack Thereof) ---------------------------- The thing is, I don't think there is any "simple" conclusion that can be drawn from these few benchmarks. In addition, these are just specific benchmarks, and performance is certainly highly dependent on the application and its memory usage patterns. I think the question of whether BDW-GC is "worth it" becomes more of an engineering matter if performance is neither problematic not exceptional. Such a long mail for just that?! Yes! :-) Thanks, Ludo'. [0] It all started some time ago: http://thread.gmane.org/gmane.lisp.guile.devel/5946 . [1] http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2395/focus=2464 [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html [3] http://www.ccs.neu.edu/home/will/Twobit/KVW/string.txt [4] These are widespread Scheme benchmarks and can be found, e.g., at https://trac.ccs.neu.edu/trac/larceny/browser/trunk/larceny_src/test/Benchmarking/CrossPlatform/src [5] "(let loop ((i 10000000)) (and (> i 0) (loop (1- i))))" [6] Guile's test suite without `popen.test' as it causes termination problems. The heap size estimate is not very reliable here, as discussed in [1]. [7] http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2324