>> ... it doesn't do a comparison against correct input.
>
> The lilypond-book directory tests features of lilypond-book, but it
> doesn't any real comparison. Meaningful comparison would be to compare
> PDF files (after processing with LaTeX) across versions,
I disagree. There are two issues that should be tested.
* Check whether PDFs (and other output formats) can be actually
generated. AFAICS, this is what the current tests do.
* Process the test input files with the current `lilypond-book`
version, check its output for correctness (including manual
compilation with LilyPond so that the PDFs can be inspected *once*),
then store the output files from lilypond-book – and only from
lilypond-book, without calling LilyPond – as a baseline.
Newer versions simply compare its output against this baseline.
Werner