Hi Laurent,

I feel as if this much effort put into creating fast alternatives for these operations is an interesting academic pursuit, but we might be better served by analyzing how we use floor/ceil and finding was to reduce those or find more targeted algorithms for those on a case by case basis - if they are in an inner loop. The foo_int() methods are the ones that I'm mainly interested as they pertain to the inner loop of the rasterizer - on the other hand we might be able to avoid them with fixed point arithmetic instead.

With regard to using them in the normalizing iterator - are the target customers leaving normalization enabled for their shape rendering? For cases like map rendering and other typical server rendering issues I would think that they would want it off for more accurate paths, and also to get rid of some unnecessary pre-processing that was only originally meant to be a band-aid for developers who were expecting drawRect(x,y,w-1,h-1) to touch the row of pixels around the inside of that rectangle. If we get rid of normalization there are likely few other uses of floor/ceil in our rendering flow...

                        ...jim

On 7/3/15 1:51 PM, Laurent Bourgès wrote:
Jim,

Here is an updated webrev:
http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/

Changes:
- enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now
- renamed faster alternatives as int ceil_int(float) and float
floor_int(float) that are faster in the integer domain
- restored ceil_f / floor_f (float) methods that are strictly correct as
(float) StrictMath.ceil/floor(double)
- made FloatMath class and its methods public to be available for tests
and maybe more general use in graphics / java2d ...

It is still faster than previous FloatMath and Marlin is a bit faster too:
see results at then end !


Here are few comments on joe's proposal:

    >> I could propose my implementations of float ceil/floor (float) that are
    >> exactly giving the same results than (float)StrictMath.ceil/floor 
(double).
    >> According to my benchmarks, it is 25% faster.

    >
    >
    > I don't think we need to limit ourselves to either StrictMath or Math. We 
simply need something predictable that has properties which work for our needs.
    >
    I was just proposing the 2 methods float ceil/floor (float) (derived
    from StrictMath) to be included the core libs if it is useful for
    general use (25% faster).

Joe, are you interested by ceil_f / floor_f variants (25% faster than
StrictMath) ?

So, you can *almost* get away with

int ceil_returning_int(floor f) {
      if (f > 0.0)
          return - ((int)(-f))
      else
          return (int) f;
}

int floor_returning_int(floor f) {
      if (f < 0.0)
          return - ((int)(-f))
      else
          return (int) f;
}

I tried joe's proposal but it does not work:
Round to zero is not equivalent to ceil or floor !


In what way do Joe's techniques fail?  Integer casts should be a truncate operation (is 
that what you refer to as "round to zero"?) and should be the same as floor() 
for non-negative numbers and -((int)(-v)) should be the same as floor for negative 
numbers...



I tried and it does not work

ceil (1.2)=2
But (int)(-1.2)=-1 (round to zero).
So the result is 1 and not 2 !

That's why my variant adds/substract 1 !
But it make infinity / nan handling more painful and a bit costly.


Jim, I will next make tests:

1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX)

2/ use fixed point approach (longer work) to only use integer maths in
Marlin rendering loop (crossings)

=> faster (no float to int conversions ?) but also more scalable on
hyperThreading CPU ?

=> edge array will then only contain int[] and Unsafe usage is no more
necessary


Cheers,

Laurent


PS: Here are some benchmark results made on values only in the integer
domain:

JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18]
floats = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2, -1.9,
-0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23,
2.13422758E9, 1.37992608E8, 134758.4]

strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]
floatMathCeil    = [-2134227584, -137992608, -134758, -131, -17, -1, 0,
0, 0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1,
2134227584, 137992608, 134759]
FloatMathCeil_f  = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]

strictMathFloor_f   = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor    = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor_f  = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]

- Benchmarks ---
# Calib: run duration:  5 000 ms
4 threads, Tavg =      3,03 ns/op (σ =   0,02 ns/op), Total ops =
6616663415 [     3,07 (1633532581),      3,02 (1656839291),      3,01
(1662195896),      3,01 (1664095647)]
#

#-------------------------------------------------------------
*# StrictMathCeil_f*: run duration:  5 000 ms
**float = *(float) StrictMath.ceil(f)
*
             1 threads, Tavg =    112,46 ns/op (σ =   0,00 ns/op), Total
ops =     44462614 [   112,46 (44462614)]
2 threads, Tavg =    112,53 ns/op (σ =   0,20 ns/op), Total ops =
88864503 [   112,74 (44351706),    112,33 (44512797)]
             3 threads, Tavg =    112,75 ns/op (σ =   0,31 ns/op), Total
ops = 133042189 [   112,67 (44379882),    112,42 (44478562),    113,17
(44183745)]
*            4 threads, Tavg =    113,61 ns/op (σ =   1,18 ns/op),*
Total ops =    176004512 [   115,63 (43242922),    113,27 (44144214),
112,59 (44409190),    113,01 (44208186)]
#

#-------------------------------------------------------------
*# FloatMathCeil_f:* run duration:  5 000 ms
*float = FloatMath.ceil_f(f)
*
             1 threads, Tavg =     85,42 ns/op (σ =   0,00 ns/op), Total
ops =     58534818 [    85,42 (58534818)]
2 threads, Tavg =     85,56 ns/op (σ =   0,18 ns/op), Total ops =
116880361 [    85,74 (58318655),     85,38 (58561706)]
             3 threads, Tavg =     85,49 ns/op (σ =   0,11 ns/op), Total
ops = 175469910 [    85,64 (58386401),     85,42 (58535723),     85,40
(58547786)]
*            4 threads, Tavg =     86,10 ns/op (σ =   0,86 ns/op),
*Total ops =    232739544 [    87,59 (57200792),     85,61
(58519538),     85,47 (58617544),     85,79 (58401670)]
#

#-------------------------------------------------------------
*# FloatMathCeil:* run duration:  5 000 ms
*int = FloatMath.ceil(f)*

             1 threads, Tavg =     56,72 ns/op (σ =   0,00 ns/op), Total
ops =     88153017 [    56,72 (88153017)]
2 threads, Tavg =     56,90 ns/op (σ =   0,16 ns/op), Total ops =
175737994 [    57,06 (87626873),     56,75 (88111121)]
             3 threads, Tavg =     56,82 ns/op (σ =   0,15 ns/op), Total
ops = 264003134 [    57,02 (87684429),     56,76 (88087214),     56,67
(88231491)]
*            4 threads, Tavg =     57,16 ns/op (σ =   0,57 ns/op),*
Total ops =    350060098 [    58,12 (86072473),     56,74
(88161260),     56,68 (88251450),     57,12 (87574915)]
#

#-------------------------------------------------------------
*# StrictMathFloor_f:* run duration:  5 000 ms
*float = (float) StrictMath.floor(f)*

             1 threads, Tavg =    108,69 ns/op (σ =   0,00 ns/op), Total
ops =     46005419 [   108,69 (46005419)]
2 threads, Tavg =    108,87 ns/op (σ =   0,25 ns/op), Total ops =
91856264 [   109,11 (45824174),    108,62 (46032090)]
             3 threads, Tavg =    108,66 ns/op (σ =   0,01 ns/op), Total
ops = 138046291 [   108,65 (46019660),    108,68 (46008068),    108,65
(46018563)]
*            4 threads, Tavg =    109,99 ns/op (σ =   1,00 ns/op),*
Total ops =    182162538 [   111,63 (44870853),    109,77 (45631259),
108,90 (45994047),    109,69 (45666379)]
#

#-------------------------------------------------------------
*# FloatMathFloor_f: *run duration:  5 000 ms
*float = FloatMath.floor_f(f)*

             1 threads, Tavg =     79,60 ns/op (σ =   0,00 ns/op), Total
ops =     62816917 [    79,60 (62816917)]
2 threads, Tavg =     79,44 ns/op (σ =   0,15 ns/op), Total ops =
125890579 [    79,58 (62827873),     79,29 (63062706)]
             3 threads, Tavg =     79,38 ns/op (σ =   0,15 ns/op), Total
ops = 188968096 [    79,59 (62823628),     79,23 (63107367),     79,32
(63037101)]
*            4 threads, Tavg =     79,88 ns/op (σ =   0,83 ns/op),*
Total ops =    250828233 [    81,31 (61604026),     79,60
(62930953),     79,32 (63149634),     79,33 (63143620)]
#

#-------------------------------------------------------------
*# FloatMathFloor:* run duration:  5 000 ms
*float = FloatMath.floor(f)*

             1 threads, Tavg =     70,20 ns/op (σ =   0,00 ns/op), Total
ops =     71226367 [    70,20 (71226367)]
2 threads, Tavg =     70,35 ns/op (σ =   0,16 ns/op), Total ops =
142141053 [    70,51 (70910131),     70,20 (71230922)]
             3 threads, Tavg =     70,26 ns/op (σ =   0,08 ns/op), Total
ops = 213504247 [    70,20 (71225449),     70,38 (71046834),     70,19
(71231964)]
*            4 threads, Tavg =     70,67 ns/op (σ =   0,60 ns/op),
*Total ops =    283376128 [    70,24 (71279973),     70,58
(70931050),     70,20 (71320272),     71,68 (69844833)]
#

Reply via email to