Ok, so I whipped up a quick test based on that old PS conversion project of
mine.

Here are some partial timings, taken on my mac laptop (2019, 2,3 GHz 8-Core
Intel Core i9),
all code single-threaded.

Here's the execution log:

Parsing scene '/Users/lukes/Documents/src/undvips/tex/schintro_outline.udps'
Parse complete 1156.21 ms
Scene contains 468180 glyphs (2.46958 us/glyph)
Scene contains 16 fonts
AS build complete 182.568 ms (0.389953 us/glyph)

Explanation:
 - the program loads a document in my custom UDPS format (which is produced
by those PostScript scripts). At the moment it's an entire book (this one:
https://docs.scheme.org/schintro/, but in PostScript form). See below for
an example of what this looks like. There are 321 pages in the document. I
expect the parsing to be about linear in the number of lines in the UDPS
file (this file has about 141k lines)
 - "scene" means the document I guess, sorry, force of habit
 - AS is the acceleration structure. Our builds would be superlinearly
faster: the builder algorithm is O(nlog(n)), so building 321 little scenes
is faster than building one
"all together" scene (the new time is (nlog(n) - nlog(321)) where n is the
number of objects (~468k) so for this example it's something like 40%
faster, ~110ms. These are still averaging 1500 objects per page, so they're
small scenes, but not _that_ small).

For the comparison method I have in mind, I think the build time is a good
proxy for the whole test execution, because pretty much the idea is that
you
load two scenes and assume they are about the same, then go ahead and build
the AS on both at the same time (this does not need the primitives to be in
the same order).
Where the scenes are different, the builds will start to diverge in
topology, and then in the end there will be a bunch of  identical stuff,
and just need to build a report
on what we'd like to know about the difference between the two.

Missing bits: if people think this idea has merit, we need something to
stand in the role of my little UDPS file format. I see a couple different
avenues

   1. Rewrite this PS extraction thing so that it outputs more about the
   graphics in the PS
      - Measure its performance (it seems to fail on GPL Ghostscript
      10.05.1, I don't know why at the moment). Regrettably I have no notes on
      how long this step took when I was last working on this
   2. Alternatively, and possibly much better, pivot and implement a
   similar idea using mupdf library, but using the PDF directly, sidestepping
   the PostScript entirely
   3. Alternatively still, instrument some late layer of ... "Cairo" (say)
   to emit UDPS directly from the C++

I think that I would do 1 only if PostScript was our true source of truth.
I'm leaning towards 2 instead because in my mind at least for the final
users, lilypond produces PDF, so we should verify that this specific
product is healthy. This introduces a new dependency on mupdf (or some
other thing we can use to traverse the objects in a PDF, maybe poppler
might be more palatable).
All the same, avenue 3 is potentially the fastest/cleanest (and it would
allow us to instrument the test backend a little, to guide the comparison
better). However the worry is that depending on how this "tap" is injected
into the code, problems beyond it might be missed which would be ungood.

Lastly, my little test is already built to support multiple pages, and this
could be advantageous in that it would save us to run
all the warmup code over and over again. I guess I'm saying: if we can make
it work for the whole set at once we could be looking at
~1.5 seconds to produce the comparison result for the entire test suite. If
I remember right it takes longer than this to generate the
artifacts themselves, is that right?

Cheers,
Luca

UDPS Sample for page 1:
This just to give you a sense of what's in the file I'm parsing
Command explanation:

   - bop - being of page
   - font - change font
   - txt - a short run of text
   - eop - end of page
   - shwpg - PostScript's ShowPage command (does nothing in for us)


bop   0
font  CMBX12 [0.0860938,0.0,0.0,-0.0860938,0.0,0.0] 20.74 spc 43.0339
xheight 38.151
txt   org [-117.0,927.0] bbox [-113.672,866.846]-[7.56836,927.002] end
[9.98567,927.0] 2 An
txt   org [42.9857,927.0] bbox [45.6543,868.018]-[130.534,927.002] end
[132.96,927.0] 2 In
txt   org [129.96,927.0] bbox [131.787,872.445]-[253.516,927.523] end
[255.936,927.0] 3 tro
txt   org [258.936,927.0] bbox [262.191,867.236]-[574.43,927.523] end
[576.856,927.0] 7 duction
txt   org [608.856,927.0] bbox [610.677,872.445]-[692.415,927.523] end
[694.842,927.0] 2 to
txt   org [726.842,927.0] bbox [732.178,866.976]-[821.061,928.044] end
[823.815,927.0] 2 Sc
txt   org [820.815,927.0] bbox [824.398,867.367]-[1041.13,927.523] end
[1043.78,927.0] 4 heme
txt   org [1074.78,927.0] bbox [1077.72,867.367]-[1226.42,927.523] end
[1229.74,927.0] 3 and
txt   org [1261.74,927.0] bbox [1265.53,867.236]-[1361.56,927.523] end
[1364.71,927.0] 3 its
txt   org [1397.71,927.0] bbox [1400.37,867.367]-[1816.19,943.669] end
[1818.62,927.0] 8 Implemen
txt   org [1815.62,927.0] bbox [1817.45,867.236]-[2065.14,927.523] end
[2067.56,927.0] 6 tation
font  CMR10 [0.0454545,0.0,0.0,-0.0454545,0.0,0.0] 10.95 spc 22.7213
xheight 19.4662
txt   org [617.0,1114.0] bbox [618.555,1083.14]-[645.182,1114.0] end
[647.99,1114.0] 1 P
txt   org [646.99,1114.0] bbox [648.877,1082.62]-[706.51,1114.52] end
[707.976,1114.0] 3 aul
txt   org [722.976,1114.0] bbox [724.544,1083.14]-[764.632,1114.97] end
[768.956,1114.0] 2 R.
txt   org [783.956,1114.0] bbox [784.733,1082.62]-[932.08,1122.72] end
[935.893,1114.0] 7 Wilson,
txt   org [951.893,1114.0] bbox [953.385,1083.14]-[1046.84,1114.97] end
[1047.86,1114.0] 4 Univ
txt   org [1046.86,1114.0] bbox [1048.1,1083.72]-[1130.84,1114.52] end
[1133.79,1114.0] 5 ersit
txt   org [1132.79,1114.0] bbox [1133.63,1094.53]-[1155.76,1123.24] end
[1156.78,1114.0] 1 y
txt   org [1172.78,1114.0] bbox [1174.01,1082.1]-[1211.91,1114.52] end
[1209.77,1114.0] 2 of
txt   org [1224.77,1114.0] bbox [1226.4,1083.4]-[1255.76,1114.0] end
[1257.76,1114.0] 1 T
txt   org [1253.76,1114.0] bbox [1255.0,1093.75]-[1337.01,1114.52] end
[1338.73,1114.0] 4 exas
font  CMTT10 [0.0454545,0.0,0.0,-0.0454545,0.0,0.0] 10.95 spc 23.8607
xheight 19.4662
txt   org [736.0,1176.0] bbox [736.719,1148.13]-[1215.01,1176.25] end
[1215.82,1176.0] 20 wil...@cs.utexas.edu
txt   org [534.0,1239.0] bbox [534.521,1207.62]-[1084.98,1249.02] end
[1085.79,1239.0] 23 http://www.cs.utexas.ed
txt   org [1084.79,1239.0] bbox [1085.32,1207.62]-[1249.51,1242.77] end
[1252.73,1239.0] 7 u/users
txt   org [1251.73,1239.0] bbox [1254.33,1207.62]-[1418.85,1242.77] end
[1419.66,1239.0] 7 /wilson
eop
shwpg


-- 
Luca Fascione

Reply via email to