Hi again, I feel as if this much effort put into creating fast alternatives for these > operations is an interesting academic pursuit, but we might be better > served by analyzing how we use floor/ceil and finding was to reduce those > or find more targeted algorithms for those on a case by case basis - if > they are in an inner loop. The foo_int() methods are the ones that I'm > mainly interested as they pertain to the inner loop of the rasterizer - on > the other hand we might be able to avoid them with fixed point arithmetic > instead. >
I think I covered both points: - FloatMath.ceil_f(float) and FloatMath.floor_f(float) are my "standard" implementations for general use. They behave like (float)StrictMath.floor(double)... and proved to be 25% faster. I provided a test class to prove these methods are exact. Joe, what is your opinion ? is it interesting for the core-libs ? - FloatMath.ceil_int(float) and FloatMath.floor_int(float) are optimized for Marlin use cases (integer domain). I list here all use cases in the fixed-point variant: ceil(): Renderer.addLine 353: firstCrossing = Math.max(FloatMath.ceil(y1 - 0.5f), _boundsMinY); 357: lastCrossing = Math.min(FloatMath.ceil(y2 - 0.5f), boundsMaxY); Renderer.endRendering 1293: spminX = Math.max(FloatMath.ceil(edgeMinX - 0.5f), boundsMinX); 1294: spmaxX = Math.min(FloatMath.ceil(edgeMaxX - 0.5f), boundsMaxX - 1); 1296: spminY = Math.max(FloatMath.ceil(edgeMinY - 0.5f), _boundsMinY); 1298: maxY = FloatMath.ceil(edgeMaxY - 0.5f); floor(): MarlinRenderingEngine.NormalizingPathIterator.currentSegment 531: x_adjust = FloatMath.floor(coord) + rval - coord; 535: y_adjust = FloatMath.floor(coord) + rval - coord; 545: x_adjust = FloatMath.floor(coord + lval) + rval - coord; 549: y_adjust = FloatMath.floor(coord + lval) + rval - coord; Renderer.addLine 475: final float floor_slope = FloatMath.floor(slope); Let me insist that these optimized methods provide some speedup (even small): - addLine() = 2 ceil() calls + 1 floor() call per edge => millions ! - endRendering() = 4 ceil() calls per shape => thousands ! - NormalizingPathIterator: 2 floor() calls per segment => many BUT only if normalization is enabled = exceptional case. For me, ceil() / floor() are not used in the "inner loop of the rasterizer" but are still in another "hot" loop = the shape loop: for-each (shape) { AATileGenerator aatg = renderengine.getAATileGenerator(s, sg.transform, clip, bs, thin, adjust, abox); ... renderTiles(sg, s, aatg, abox, ts); } That's why the speedup depends on the shape count or the shape complexity (edge count). With regard to using them in the normalizing iterator - are the target > customers leaving normalization enabled for their shape rendering? For > cases like map rendering and other typical server rendering issues I would > think that they would want it off for more accurate paths, and also to get > rid of some unnecessary pre-processing that was only originally meant to be > a band-aid for developers who were expecting drawRect(x,y,w-1,h-1) to touch > the row of pixels around the inside of that rectangle. If we get rid of > normalization there are likely few other uses of floor/ceil in our > rendering flow... > I agree my benchmark or map rendering does not use the normalizing iterator; however, it is part of the RenderingEngine interface so it must remain supported for compatibility reasons but it may be left unoptimized. PS: NaN or Infinity (ie integer overflow) remains and it still needs a new pre-processing stage in the Marlin pipeline. Laurent