Hi all, I'd like to give an update on the status of Guile 2.2, including some benchmark numbers. See the end for my conclusions, but I'd welcome comments on your take.
Setup ----- All numbers are from my laptop running Arch Linux (with pango downgraded to 1:1.48.2-1 to keep out the memory hogging in 1.48.3) and measured with "/usr/bin/time -v". I use commit fce156f219 from https://gitlab.com/lilypond/lilypond/-/merge_requests/723 (see User's View) and the system-provided versions of Guile 1.8.8 and Guile 2.2.6. I don't test Guile 3.0 because it's not available as package, and there seem to be more changes needed to make it even work. I configured the build with --enable-gs-api (because I'm interested in the time spent in LilyPond, not how fast the system can fork gs). Developer's View ---------------- The first set of experiments is about things that developers care about, namely 'make test' and 'make doc'. For all runs, I used the options "-j4 CPU_COUNT=4". 'make test' when compiled with Guile 1.8.8: * User time (seconds): 520.68 * Elapsed (wall clock) time (h:mm:ss or m:ss): 3:59.54 * Maximum resident set size (kbytes): 372560 'make test' when compiled with Guile 2.2.6: * User time (seconds): 710.35 * Elapsed (wall clock) time (h:mm:ss or m:ss): 5:43.04 * Maximum resident set size (kbytes): 372344 -> 40% slower when using Guile 2.2.6 during development 'make doc' when compiled with Guile 1.8.8: * User time (seconds): 2915.35 * Elapsed (wall clock) time (h:mm:ss or m:ss): 18:17.24 * Maximum resident set size (kbytes): 467768 'make doc' when compiled with Guile 2.2.6: * User time (seconds): 3258.72 * Elapsed (wall clock) time (h:mm:ss or m:ss): 20:37.10 * Maximum resident set size (kbytes): 351560 -> around 10% slower when using Guile 2.2.6 Both times can be improved by running $ touch test.ly $ GUILE_AUTO_COMPILE=1 ./out/bin/lilypond test.ly to compile the Guile modules (before you ask: no, guild compile does not work; see the conclusion). This itself takes 2 minutes and is unfortunately single-threaded. Afterwards, 'make test' takes: * User time (seconds): 494.85 * Elapsed (wall clock) time (h:mm:ss or m:ss): 3:48.45 * Maximum resident set size (kbytes): 372056 and 'make doc' takes: * User time (seconds): 2929.24 * Elapsed (wall clock) time (h:mm:ss or m:ss): 18:34.07 * Maximum resident set size (kbytes): 351432 which is more or less level with the numbers using Guile 1.8.8. User's View ----------- Most LilyPond users care about installed versions and their own scores, so I ran 'make install' (with DESTDIR, actually, because I need two installation with Guile 2.2) and then renamed the build directories for extra caution. I only give the wall clock time because LilyPond is single-threaded, so user time is the same. The first test, simple.ly, is just a "{ c'1 }" to measure startup time. With Guile 1.8.8: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.90 * Maximum resident set size (kbytes): 132300 With Guile 2.2.6: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.85 * Maximum resident set size (kbytes): 102760 Now that's a serious regression and clearly unacceptable for users trying out LilyPond. To remedy, I install the compiled bytecode from above (stored in out/share/lilypond/current/guile/) with $ mkdir -p /path/to/install/lib/lilypond/2.23.2/ccache/lily $ for f in $(find out/share/lilypond/current/guile/ -name "*.go"); do cp $f /path/to/install/lib/lilypond/2.23.2/ccache/lily/$(basename $f .scm.go).go; done (note: Guile auto-compiles to .scm.go, but only looks for .go files) Then, I get: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.84 * Maximum resident set size (kbytes): 95628 which is *faster* than with Guile 1.8.8 🙂 Some more serious scores: input/regression/mozart-hrn-3.ly (4 pages, some 200 bars) With Guile 1.8.8: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.13 * Maximum resident set size (kbytes): 276988 With Guile 2.2.6: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.97 * Maximum resident set size (kbytes): 169216 With Guile 2.2.6 and compiled bytecode: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.75 * Maximum resident set size (kbytes): 188632 First movement from BWV1060R (one of my entered scores, 7 pages, some 100 bars in two voices) With Guile 1.8.8: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.03 * Maximum resident set size (kbytes): 390452 With Guile 2.2.6: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.41 * Maximum resident set size (kbytes): 189364 With Guile 2.2.6 and compiled bytecode: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.05 * Maximum resident set size (kbytes): 189336 Missa Dum sacrum mysterium (from https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00700.html and run through convert-ly) With Guile 1.8.8: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.15 * Maximum resident set size (kbytes): 969572 With Guile 2.2.6: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.22 * Maximum resident set size (kbytes): 1034672 With Guile 2.2.6 and compiled bytecode: * Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.91 * Maximum resident set size (kbytes): 1083204 Les Festes Venitiennes (from https://lists.gnu.org/archive/html/lilypond-user/2016-11/msg00948.html and run through convert-ly) With Guile 1.8.8: * Elapsed (wall clock) time (h:mm:ss or m:ss): 1:23.96 * Maximum resident set size (kbytes): 2091760 With Guile 2.2.6: * Elapsed (wall clock) time (h:mm:ss or m:ss): 1:56.79 * Maximum resident set size (kbytes): 2260880 With Guile 2.2.6 and compiled bytecode: * Elapsed (wall clock) time (h:mm:ss or m:ss): 1:33.24 * Maximum resident set size (kbytes): 2353024 Conclusion ---------- In my opinion, the numbers show that we *must have* compiled bytecode for user installations in order to reach acceptable performance (memory usage seems to be fine). And while compilation by invoking LilyPond is somewhat odd, it works and would be viable for the beginning. For development, I'm less convinced. Sure, 'make test' and 'make doc' get faster but the compilation itself takes a considerable amount of time. Moreover it is my understanding the Guile is notoriously bad at determining which files to recompile, in particular when macros are involved. Personally, I would be ok with the moderate slowdown if that was the only thing preventing a hypothetic switch to Guile 2.2, and the arising question is really a matter of prioritizing: There are other items that need solutions before that switch could happen (in particular the release process for binary builds and documentation). What do others think? Or would you say that proper bytecode compilation is required before moving to Guile 2.2? (with no clear estimate how feasible that is and how long it would take) Sorry for the long and densely packed message, and thanks for reading to the end! Jonas
signature.asc
Description: This is a digitally signed message part
