Re: Convolution based regtest comparison

Luca Fascione Sun, 15 Jun 2025 23:52:29 -0700

On Sun, Jun 15, 2025 at 6:12 PM Han-Wen Nienhuys <hanw...@gmail.com> wrote:

> On Sun, Jun 15, 2025 at 12:34 PM Luca Fascione <l.fasci...@gmail.com>
> wrote:
> > I do find your concept interesting though, I'm not sure what smoothing
> > filter you're using, but I wonder if this is a case where traditionally
> > "bad" filters would actually
> > be advantageous here. Usually the "box" filter is considered the worse,
> on
> > account of the substantial ringing it exhibits.
>
> It is a box filter, on account of efficiency: since this is part of
> the compile/debug cycle, it should be fast.
>

Sorry, I follow you but only to a point: the dependency of the convolution
speed on the kernel data should be
minimal here, if you're seeing it behave otherwise, may I ask what
implementation you are using?
If you do something like: use PIL to make a float buffer from a picture,
then numpy/scipy to convolve and extract
the result, changing the kernel from one to another should have no speed
effect, and going from box to having a kernel
should be measurable but unimportant (say 10% slower kinda thing).

How long does your test take to run at the moment?

On a separate note, the correct thing to do is to use a _separable_ version
of the filters,
(which for big filters and images can be a performance advantage), not a
radial one as
sometimes suggested. [1]

Btw, for certain filters you have this property: say that conv(r, k, img)
is your convolution of size r with kernel k.
Then there are k's where conv(2r, k, img) == conv(r, k, conv(r, k, img))
(and in many cases it's a passable approximation anyways).
This is neat in your case where you want in hand the whole blurring pyramid
anyways,
because a radius r convolution is generally faster than a radius 2r one, so
all you need is a forloop
that keeps feeding the current source back to the conv operator, and you're
done. A long chain of
multiply-add like in a Horner-rule loop.

In fact all this code comes already written for you, courtesy of the
graphics people: you can just make a mipmap
of your image and cumulate the (expanded) mip levels from that.
Because you can leverage very robust pre-existing implementations, you can
rely on the mipmap algorithm being
already written correctly, and optimized well beyond what you'll want to
spend yourself.
OpenImageIO has a quality one called maketx, for example. That thing has a
level of on-the-ground hardening
that would be difficult to match.

> > Me, I had been musing if instead one could simply scan the EPS or PDF and
> > put all objects into a quadtree,
> >..
>
> you can do very complex things, but
>

Forgive me, I don't follow: which part do you see as complex?

I know I write these kind of algorithms all the time, so I probably am more
used this this stuff than
the average person, but from where I stand, the "scene processing" part is
real easy (hundreds of lines
of code if we roll it out by hand say only using the STL).
For the EPS extraction I was thinking to use a PostScript implementation,
to have easier access
to the final coordinates, while for the PDF extraction there are these
libraries
like muPDF (IIRC) that also seem like they would make it not that big a
deal.
There is also an option where we could instead ask Cairo to tell us what
it's doing,
which on the one hand might save us time, but on the other, wouldn't
protect us
from bugs inside Cairo itself. That's a risk-analysis choice I have no
insight into.

> 1) right now, it only has to be better than the whole-image MAE diff
> 2) as part of the development cycle, it has to be fast.
>

Assuming here you mean "it has to run quickly"

I am unclear why comparing two lists of objects with a log(N) search time
would be slower than a convolution pyramid.
In fact my expectation is the opposite, the list-to-list comparison should
be
_much_ faster, especially for scenes with depth complexity as low as ours
(avg number of objects per pixel, which in our scenes is 0 for the vast
majority
of pixels, mostly one otherwise, and occasionally 2 or more)[2].

Also, this particular problem structure lends itself to a many-to-many
search
approach, where you query the nearest neighbour of all objects at once in a
single traversal step, instead of querying one at a time.
There are several papers on this subject, my favorite is by Attila Afra[3].

Given the stuff I've seen in the regression suite, the execution of the test
(parse scene, build AS, run comparison, write report) should be measured
in milliseconds for a no-frills, no crazy optimizations implementation.
Could be microseconds if you had the time to squeeze it down, but then
I'd expect this thing to be bottlenecked by the scene loading anyways.

I think about it this way: a range/nearest neighbour query like the ones we
need
here is effectively just about as complex as a ray trace query.
Tracing primary hits on a 3D data structure with triangle counts in the
tens of
millions runs single threaded to the tune of several times per second, for
images
that have say a million pixels on them (1k pixels, square)[4].
So we're talking tens of millions of queries per second on trees that are
substantially slower to traverse than the contents of our tests (they are
slower
for 3 kinds of reason: 1. the trees themselves are much deeper than our
case,
2. they separate a lot less well, ie. they don't prune the search space as
effectively
as we can, and 3. these are 3D datasets instead of 2D. It seems like not
much, but this
stuff adds up to more than one would expect).

I'll bet you the scene parse will be by far the slowest part of the whole
thing,
and possibly slower still will be the EPS/PDF-to-scene description
translation. But because we save on the rasterization (which is way slow
when using
ghostscript: for reasons that escape me that thing is just lethargic...).
I still suspect time-wise this is a substantial net win, actually.

> if you want to create a quad tree of objects, it would be simpler to
> revisit the original testing framework (which I removed), which
> annotates Stencils with their bounding boxes and where they came from.
>

Well, I was thinking something even simpler actually: it occurred to me you
don't even
need the bbox for this. You just need some kind of description of the
shape: for lines
and beziers it could be their CV list and such, for font-based elements the
font/glyph id pair,
for externally-placed elements you just need the "filename", position,
sizing.
Then all you do is compare the reference position of these shapes:
either it's the same object description as the same place (match) or it's
not (no match).
Why would you need anything more?
This is also a very efficient way to compare bitmaps and other odd stuff in
the final product,
because of the same reason, and it bypasses all sorts of
spurious artifacts (random example: if you shift an image by half a pixel,
all the rendered
pixels change. Depending on how you implement your img diff, you could get
a very high
diff score, whereas the other way will just say "Look, img7 is half a pixel
to the left").

I am sensitive to the argument that is not very clear how much time we
should be investing
in this task. However, it is taking literally years to get anything done
for this, so I'm not sure how much
difference this would make in practice. It seems to me the thing that would
make the biggest difference
is that the suite runs fairly quickly so that people can run it often, so
that bugs can be caught earlier.

[1] Quick sketch of the reason: even if pixels were not flat-colored
squares, their
centers would still lie on a square lattice, so in the diagonal direction
they are spaced
wider than they are in axis-aligned directions. The separable filters have
a similar
characteristic (if you squint enough and want to believe), and lining up
the two helps
reduce observed artifacts in the result.
There is a longer harmonic-analysis story here than this margin has space
for.

[2] There is an effect where when the scene depth complexity is low, the
search
becomes faster than avg "theoretical" runtime performance (roughly because
it's easier to bail out of the acceleration structure early, and because
the tree
becomes much better at pruning away parts of the scene that are not relevant
to the query).

[3]
https://diglib.eg.org/bitstream/handle/10.2312/conf.EG2012.short.097-100/097-100.pdf?sequence=1

[4] This student-grade code for paper benchmarking. If you think about the
ray tracing speed in today's
video games you can see they hit numbers like hundreds of millions of ray
per second including executing
shaders and acceleration structure rebuild per-frame.

-- 
Luca Fascione

Re: Convolution based regtest comparison

Reply via email to