Re: [OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Jim Graham Tue, 07 Jul 2015 18:24:57 -0700

Hi Laurent,

I feel as if this much effort put into creating fast alternatives forthese operations is an interesting academic pursuit, but we might bebetter served by analyzing how we use floor/ceil and finding was toreduce those or find more targeted algorithms for those on a case bycase basis - if they are in an inner loop. The foo_int() methods arethe ones that I'm mainly interested as they pertain to the inner loop ofthe rasterizer - on the other hand we might be able to avoid them withfixed point arithmetic instead.

With regard to using them in the normalizing iterator - are the targetcustomers leaving normalization enabled for their shape rendering? Forcases like map rendering and other typical server rendering issues Iwould think that they would want it off for more accurate paths, andalso to get rid of some unnecessary pre-processing that was onlyoriginally meant to be a band-aid for developers who were expectingdrawRect(x,y,w-1,h-1) to touch the row of pixels around the inside ofthat rectangle. If we get rid of normalization there are likely fewother uses of floor/ceil in our rendering flow...


                        ...jim

On 7/3/15 1:51 PM, Laurent Bourgès wrote:

Jim,

Here is an updated webrev:
http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/

Changes:
- enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now
- renamed faster alternatives as int ceil_int(float) and float
floor_int(float) that are faster in the integer domain
- restored ceil_f / floor_f (float) methods that are strictly correct as
(float) StrictMath.ceil/floor(double)
- made FloatMath class and its methods public to be available for tests
and maybe more general use in graphics / java2d ...

It is still faster than previous FloatMath and Marlin is a bit faster too:
see results at then end !


Here are few comments on joe's proposal:

    >> I could propose my implementations of float ceil/floor (float) that are
    >> exactly giving the same results than (float)StrictMath.ceil/floor 
(double).
    >> According to my benchmarks, it is 25% faster.

    >
    >
    > I don't think we need to limit ourselves to either StrictMath or Math. We 
simply need something predictable that has properties which work for our needs.
    >
    I was just proposing the 2 methods float ceil/floor (float) (derived
    from StrictMath) to be included the core libs if it is useful for
    general use (25% faster).

Joe, are you interested by ceil_f / floor_f variants (25% faster than
StrictMath) ?

So, you can *almost* get away with

int ceil_returning_int(floor f) {
      if (f > 0.0)
          return - ((int)(-f))
      else
          return (int) f;
}

int floor_returning_int(floor f) {
      if (f < 0.0)
          return - ((int)(-f))
      else
          return (int) f;
}

I tried joe's proposal but it does not work:
Round to zero is not equivalent to ceil or floor !



In what way do Joe's techniques fail?  Integer casts should be a truncate operation (is 
that what you refer to as "round to zero"?) and should be the same as floor() 
for non-negative numbers and -((int)(-v)) should be the same as floor for negative 
numbers...


I tried and it does not work

ceil (1.2)=2
But (int)(-1.2)=-1 (round to zero).
So the result is 1 and not 2 !

That's why my variant adds/substract 1 !
But it make infinity / nan handling more painful and a bit costly.


Jim, I will next make tests:

1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX)

2/ use fixed point approach (longer work) to only use integer maths in
Marlin rendering loop (crossings)

=> faster (no float to int conversions ?) but also more scalable on
hyperThreading CPU ?

=> edge array will then only contain int[] and Unsafe usage is no more
necessary


Cheers,

Laurent


PS: Here are some benchmark results made on values only in the integer
domain:

JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18]

floats = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2, -1.9,
-0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23,
2.13422758E9, 1.37992608E8, 134758.4]

strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]
floatMathCeil    = [-2134227584, -137992608, -134758, -131, -17, -1, 0,
0, 0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1,
2134227584, 137992608, 134759]
FloatMathCeil_f  = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]

strictMathFloor_f   = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor    = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor_f  = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]

- Benchmarks ---
# Calib: run duration:  5 000 ms
4 threads, Tavg =      3,03 ns/op (σ =   0,02 ns/op), Total ops =
6616663415 [     3,07 (1633532581),      3,02 (1656839291),      3,01
(1662195896),      3,01 (1664095647)]
#

#-------------------------------------------------------------
*# StrictMathCeil_f*: run duration:  5 000 ms
**float = *(float) StrictMath.ceil(f)
*
             1 threads, Tavg =    112,46 ns/op (σ =   0,00 ns/op), Total
ops =     44462614 [   112,46 (44462614)]
2 threads, Tavg =    112,53 ns/op (σ =   0,20 ns/op), Total ops =
88864503 [   112,74 (44351706),    112,33 (44512797)]
             3 threads, Tavg =    112,75 ns/op (σ =   0,31 ns/op), Total
ops = 133042189 [   112,67 (44379882),    112,42 (44478562),    113,17
(44183745)]
*            4 threads, Tavg =    113,61 ns/op (σ =   1,18 ns/op),*
Total ops =    176004512 [   115,63 (43242922),    113,27 (44144214),
112,59 (44409190),    113,01 (44208186)]
#

#-------------------------------------------------------------
*# FloatMathCeil_f:* run duration:  5 000 ms
*float = FloatMath.ceil_f(f)
*
             1 threads, Tavg =     85,42 ns/op (σ =   0,00 ns/op), Total
ops =     58534818 [    85,42 (58534818)]
2 threads, Tavg =     85,56 ns/op (σ =   0,18 ns/op), Total ops =
116880361 [    85,74 (58318655),     85,38 (58561706)]
             3 threads, Tavg =     85,49 ns/op (σ =   0,11 ns/op), Total
ops = 175469910 [    85,64 (58386401),     85,42 (58535723),     85,40
(58547786)]
*            4 threads, Tavg =     86,10 ns/op (σ =   0,86 ns/op),
*Total ops =    232739544 [    87,59 (57200792),     85,61
(58519538),     85,47 (58617544),     85,79 (58401670)]
#

#-------------------------------------------------------------
*# FloatMathCeil:* run duration:  5 000 ms
*int = FloatMath.ceil(f)*

             1 threads, Tavg =     56,72 ns/op (σ =   0,00 ns/op), Total
ops =     88153017 [    56,72 (88153017)]
2 threads, Tavg =     56,90 ns/op (σ =   0,16 ns/op), Total ops =
175737994 [    57,06 (87626873),     56,75 (88111121)]
             3 threads, Tavg =     56,82 ns/op (σ =   0,15 ns/op), Total
ops = 264003134 [    57,02 (87684429),     56,76 (88087214),     56,67
(88231491)]
*            4 threads, Tavg =     57,16 ns/op (σ =   0,57 ns/op),*
Total ops =    350060098 [    58,12 (86072473),     56,74
(88161260),     56,68 (88251450),     57,12 (87574915)]
#

#-------------------------------------------------------------
*# StrictMathFloor_f:* run duration:  5 000 ms
*float = (float) StrictMath.floor(f)*

             1 threads, Tavg =    108,69 ns/op (σ =   0,00 ns/op), Total
ops =     46005419 [   108,69 (46005419)]
2 threads, Tavg =    108,87 ns/op (σ =   0,25 ns/op), Total ops =
91856264 [   109,11 (45824174),    108,62 (46032090)]
             3 threads, Tavg =    108,66 ns/op (σ =   0,01 ns/op), Total
ops = 138046291 [   108,65 (46019660),    108,68 (46008068),    108,65
(46018563)]
*            4 threads, Tavg =    109,99 ns/op (σ =   1,00 ns/op),*
Total ops =    182162538 [   111,63 (44870853),    109,77 (45631259),
108,90 (45994047),    109,69 (45666379)]
#

#-------------------------------------------------------------
*# FloatMathFloor_f: *run duration:  5 000 ms
*float = FloatMath.floor_f(f)*

             1 threads, Tavg =     79,60 ns/op (σ =   0,00 ns/op), Total
ops =     62816917 [    79,60 (62816917)]
2 threads, Tavg =     79,44 ns/op (σ =   0,15 ns/op), Total ops =
125890579 [    79,58 (62827873),     79,29 (63062706)]
             3 threads, Tavg =     79,38 ns/op (σ =   0,15 ns/op), Total
ops = 188968096 [    79,59 (62823628),     79,23 (63107367),     79,32
(63037101)]
*            4 threads, Tavg =     79,88 ns/op (σ =   0,83 ns/op),*
Total ops =    250828233 [    81,31 (61604026),     79,60
(62930953),     79,32 (63149634),     79,33 (63143620)]
#

#-------------------------------------------------------------
*# FloatMathFloor:* run duration:  5 000 ms
*float = FloatMath.floor(f)*

             1 threads, Tavg =     70,20 ns/op (σ =   0,00 ns/op), Total
ops =     71226367 [    70,20 (71226367)]
2 threads, Tavg =     70,35 ns/op (σ =   0,16 ns/op), Total ops =
142141053 [    70,51 (70910131),     70,20 (71230922)]
             3 threads, Tavg =     70,26 ns/op (σ =   0,08 ns/op), Total
ops = 213504247 [    70,20 (71225449),     70,38 (71046834),     70,19
(71231964)]
*            4 threads, Tavg =     70,67 ns/op (σ =   0,60 ns/op),
*Total ops =    283376128 [    70,24 (71279973),     70,58
(70931050),     70,20 (71320272),     71,68 (69844833)]
#

Re: [OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Reply via email to