On Sun, Jun 15, 2025 at 6:12 PM Han-Wen Nienhuys <hanw...@gmail.com> wrote:
> On Sun, Jun 15, 2025 at 12:34 PM Luca Fascione <l.fasci...@gmail.com> > wrote: > > I do find your concept interesting though, I'm not sure what smoothing > > filter you're using, but I wonder if this is a case where traditionally > > "bad" filters would actually > > be advantageous here. Usually the "box" filter is considered the worse, > on > > account of the substantial ringing it exhibits. > > It is a box filter, on account of efficiency: since this is part of > the compile/debug cycle, it should be fast. > Sorry, I follow you but only to a point: the dependency of the convolution speed on the kernel data should be minimal here, if you're seeing it behave otherwise, may I ask what implementation you are using? If you do something like: use PIL to make a float buffer from a picture, then numpy/scipy to convolve and extract the result, changing the kernel from one to another should have no speed effect, and going from box to having a kernel should be measurable but unimportant (say 10% slower kinda thing). How long does your test take to run at the moment? On a separate note, the correct thing to do is to use a _separable_ version of the filters, (which for big filters and images can be a performance advantage), not a radial one as sometimes suggested. [1] Btw, for certain filters you have this property: say that conv(r, k, img) is your convolution of size r with kernel k. Then there are k's where conv(2r, k, img) == conv(r, k, conv(r, k, img)) (and in many cases it's a passable approximation anyways). This is neat in your case where you want in hand the whole blurring pyramid anyways, because a radius r convolution is generally faster than a radius 2r one, so all you need is a forloop that keeps feeding the current source back to the conv operator, and you're done. A long chain of multiply-add like in a Horner-rule loop. In fact all this code comes already written for you, courtesy of the graphics people: you can just make a mipmap of your image and cumulate the (expanded) mip levels from that. Because you can leverage very robust pre-existing implementations, you can rely on the mipmap algorithm being already written correctly, and optimized well beyond what you'll want to spend yourself. OpenImageIO has a quality one called maketx, for example. That thing has a level of on-the-ground hardening that would be difficult to match. > > Me, I had been musing if instead one could simply scan the EPS or PDF and > > put all objects into a quadtree, > >.. > > you can do very complex things, but > Forgive me, I don't follow: which part do you see as complex? I know I write these kind of algorithms all the time, so I probably am more used this this stuff than the average person, but from where I stand, the "scene processing" part is real easy (hundreds of lines of code if we roll it out by hand say only using the STL). For the EPS extraction I was thinking to use a PostScript implementation, to have easier access to the final coordinates, while for the PDF extraction there are these libraries like muPDF (IIRC) that also seem like they would make it not that big a deal. There is also an option where we could instead ask Cairo to tell us what it's doing, which on the one hand might save us time, but on the other, wouldn't protect us from bugs inside Cairo itself. That's a risk-analysis choice I have no insight into. > 1) right now, it only has to be better than the whole-image MAE diff > 2) as part of the development cycle, it has to be fast. > Assuming here you mean "it has to run quickly" I am unclear why comparing two lists of objects with a log(N) search time would be slower than a convolution pyramid. In fact my expectation is the opposite, the list-to-list comparison should be _much_ faster, especially for scenes with depth complexity as low as ours (avg number of objects per pixel, which in our scenes is 0 for the vast majority of pixels, mostly one otherwise, and occasionally 2 or more)[2]. Also, this particular problem structure lends itself to a many-to-many search approach, where you query the nearest neighbour of all objects at once in a single traversal step, instead of querying one at a time. There are several papers on this subject, my favorite is by Attila Afra[3]. Given the stuff I've seen in the regression suite, the execution of the test (parse scene, build AS, run comparison, write report) should be measured in milliseconds for a no-frills, no crazy optimizations implementation. Could be microseconds if you had the time to squeeze it down, but then I'd expect this thing to be bottlenecked by the scene loading anyways. I think about it this way: a range/nearest neighbour query like the ones we need here is effectively just about as complex as a ray trace query. Tracing primary hits on a 3D data structure with triangle counts in the tens of millions runs single threaded to the tune of several times per second, for images that have say a million pixels on them (1k pixels, square)[4]. So we're talking tens of millions of queries per second on trees that are substantially slower to traverse than the contents of our tests (they are slower for 3 kinds of reason: 1. the trees themselves are much deeper than our case, 2. they separate a lot less well, ie. they don't prune the search space as effectively as we can, and 3. these are 3D datasets instead of 2D. It seems like not much, but this stuff adds up to more than one would expect). I'll bet you the scene parse will be by far the slowest part of the whole thing, and possibly slower still will be the EPS/PDF-to-scene description translation. But because we save on the rasterization (which is way slow when using ghostscript: for reasons that escape me that thing is just lethargic...). I still suspect time-wise this is a substantial net win, actually. > if you want to create a quad tree of objects, it would be simpler to > revisit the original testing framework (which I removed), which > annotates Stencils with their bounding boxes and where they came from. > Well, I was thinking something even simpler actually: it occurred to me you don't even need the bbox for this. You just need some kind of description of the shape: for lines and beziers it could be their CV list and such, for font-based elements the font/glyph id pair, for externally-placed elements you just need the "filename", position, sizing. Then all you do is compare the reference position of these shapes: either it's the same object description as the same place (match) or it's not (no match). Why would you need anything more? This is also a very efficient way to compare bitmaps and other odd stuff in the final product, because of the same reason, and it bypasses all sorts of spurious artifacts (random example: if you shift an image by half a pixel, all the rendered pixels change. Depending on how you implement your img diff, you could get a very high diff score, whereas the other way will just say "Look, img7 is half a pixel to the left"). I am sensitive to the argument that is not very clear how much time we should be investing in this task. However, it is taking literally years to get anything done for this, so I'm not sure how much difference this would make in practice. It seems to me the thing that would make the biggest difference is that the suite runs fairly quickly so that people can run it often, so that bugs can be caught earlier. [1] Quick sketch of the reason: even if pixels were not flat-colored squares, their centers would still lie on a square lattice, so in the diagonal direction they are spaced wider than they are in axis-aligned directions. The separable filters have a similar characteristic (if you squint enough and want to believe), and lining up the two helps reduce observed artifacts in the result. There is a longer harmonic-analysis story here than this margin has space for. [2] There is an effect where when the scene depth complexity is low, the search becomes faster than avg "theoretical" runtime performance (roughly because it's easier to bail out of the acceleration structure early, and because the tree becomes much better at pruning away parts of the scene that are not relevant to the query). [3] https://diglib.eg.org/bitstream/handle/10.2312/conf.EG2012.short.097-100/097-100.pdf?sequence=1 [4] This student-grade code for paper benchmarking. If you think about the ray tracing speed in today's video games you can see they hit numbers like hundreds of millions of ray per second including executing shaders and acceleration structure rebuild per-frame. -- Luca Fascione