Sorry for the delay in getting back. To answer your questions: the convolution is a box filter, because it can be implemented efficiently: for a NxM image, you do a single N*M precomputation, and can then compute each convolution in N*M operations.
Regarding performance, consider (cpu @ 2ghz) MAE: $ go run ./cmd/compare/ -algorithm=mae -gs_jobs=1 -cmp_jobs=1 -file_regexp 'eps' ../lilypond/{d1,d2} /tmp/tmp.UULz7I9RKh/output/ 2025/06/20 14:12:38 Convert 2 EPS files using 1 cores (batch=true) to PNG in 369.2581ms (184.62905ms/file) 2025/06/20 14:12:38 compared 1 PNG image pairs using 1 cores (imagemagick=false) in 51.97695ms (51.97695ms / pair) $ file ../lilypond/les-nereides.png ../lilypond/les-nereides.png: PNG image data, 835 x 1181, 8-bit/color RGB, non-interlaced There are ~1M pixels in the image, so that's 100 cycles per pixel, which seems like a bit too much, but actually significant time is spent decoding the png. The filter version convolves at 10 resolutions (window = 3 ... 2049), and is roughly 10 times slower: $ go run ./cmd/compare/ -algorithm=filter -gs_jobs=1 -cmp_jobs=1 -file_regexp 'eps' ../lilypond/{d1,d2} /tmp/tmp.UULz7I9RKh/output/ 2025/06/20 14:14:55 Convert 2 EPS files using 1 cores (batch=true) to PNG in 378.991521ms (189.49576ms/file) 2025/06/20 14:14:55 compared 1 PNG image pairs using 1 cores (imagemagick=false) in 448.789022ms (448.789022ms / pair) I am sure there are numeric libraries that can get the result computed more quickly, but extra dependencies are a nightmare for packaging, so we should avoid it if possible. Our pixels are mostly blank, so this isn't terribly efficient. If converting the image to vectors, you get: $ time ps2ps -dNoOutputFonts les-nereides.ps vec.ps real 0m0.181s $ time inkscape --export-filename=out.dxf vec.ps real 0m3.139s $ grep VERTEX out.dxf | wc -l 9998 10k points probably represents ~80kb of data? I agree that you'll be able to process that data more quickly than the 1mb of pixels we process using pixmaps. There are some considerations though: * Inkscape seems pretty slow for the ps -> dxf conversion. This can probably be sped up, but how? It would need some nontrivial hacking to interpret the PS and make it generate something that a vector algorithm can handle. * If you convert to a vector format, changes in the representation of the score (eg. how drawing routines are called) can generate spurious differences. It is also not guaranteed that the Cairo and PS generate something you can compare. This is also why "for font-based elements the font/glyph id pair" is suspect: changes to glyph shapes are also in scope for regtesting, and unless you expand the glyphs in their curves, you'd miss them. So in short, you are right that a vector-based approach is potentially much faster. But without a prototype, it's hard to evaluate either performance or how well it works. Bitmaps are straightforward to manipulate, and we already have the dependencies (zlib, libpng) to read them. Altogether, this was about half a day of work, and seems like it could be an improvement over what we have. HTH