Jim, Here is an updated webrev: http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/
Changes: - enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now - renamed faster alternatives as int ceil_int(float) and float floor_int(float) that are faster in the integer domain - restored ceil_f / floor_f (float) methods that are strictly correct as (float) StrictMath.ceil/floor(double) - made FloatMath class and its methods public to be available for tests and maybe more general use in graphics / java2d ... It is still faster than previous FloatMath and Marlin is a bit faster too: see results at then end ! Here are few comments on joe's proposal: > >> I could propose my implementations of float ceil/floor (float) that are > >> exactly giving the same results than (float)StrictMath.ceil/floor > (double). > >> According to my benchmarks, it is 25% faster. > > > > > > I don't think we need to limit ourselves to either StrictMath or Math. > We simply need something predictable that has properties which work for our > needs. > > > I was just proposing the 2 methods float ceil/floor (float) (derived from > StrictMath) to be included the core libs if it is useful for general use > (25% faster). > Joe, are you interested by ceil_f / floor_f variants (25% faster than StrictMath) ? >> So, you can *almost* get away with >> >> int ceil_returning_int(floor f) { >> if (f > 0.0) >> return - ((int)(-f)) >> else >> return (int) f; >> } >> >> int floor_returning_int(floor f) { >> if (f < 0.0) >> return - ((int)(-f)) >> else >> return (int) f; >> } >> >> I tried joe's proposal but it does not work: >> Round to zero is not equivalent to ceil or floor ! > > > In what way do Joe's techniques fail? Integer casts should be a truncate operation (is that what you refer to as "round to zero"?) and should be the same as floor() for non-negative numbers and -((int)(-v)) should be the same as floor for negative numbers... >> I tried and it does not work ceil (1.2)=2 But (int)(-1.2)=-1 (round to zero). So the result is 1 and not 2 ! That's why my variant adds/substract 1 ! But it make infinity / nan handling more painful and a bit costly. Jim, I will next make tests: 1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX) 2/ use fixed point approach (longer work) to only use integer maths in Marlin rendering loop (crossings) => faster (no float to int conversions ?) but also more scalable on hyperThreading CPU ? => edge array will then only contain int[] and Unsafe usage is no more necessary Cheers, Laurent PS: Here are some benchmark results made on values only in the integer domain: >> JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18] floats = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2, -1.9, -0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23, 2.13422758E9, 1.37992608E8, 134758.4] strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0, -17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9, 1.37992608E8, 134759.0] floatMathCeil = [-2134227584, -137992608, -134758, -131, -17, -1, 0, 0, 0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1, 2134227584, 137992608, 134759] FloatMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0, -17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9, 1.37992608E8, 134759.0] strictMathFloor_f = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0, -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9, 1.37992608E8, 134758.0] floatMathFloor = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0, -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9, 1.37992608E8, 134758.0] floatMathFloor_f = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0, -18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0, 1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9, 1.37992608E8, 134758.0] - Benchmarks --- # Calib: run duration: 5 000 ms 4 threads, Tavg = 3,03 ns/op (σ = 0,02 ns/op), Total ops = 6616663415 [ 3,07 (1633532581), 3,02 (1656839291), 3,01 (1662195896), 3,01 (1664095647)] # #------------------------------------------------------------- *# StrictMathCeil_f*: run duration: 5 000 ms *float = (float) StrictMath.ceil(f)* 1 threads, Tavg = 112,46 ns/op (σ = 0,00 ns/op), Total ops = 44462614 [ 112,46 (44462614)] 2 threads, Tavg = 112,53 ns/op (σ = 0,20 ns/op), Total ops = 88864503 [ 112,74 (44351706), 112,33 (44512797)] 3 threads, Tavg = 112,75 ns/op (σ = 0,31 ns/op), Total ops = 133042189 [ 112,67 (44379882), 112,42 (44478562), 113,17 (44183745)] * 4 threads, Tavg = 113,61 ns/op (σ = 1,18 ns/op),* Total ops = 176004512 [ 115,63 (43242922), 113,27 (44144214), 112,59 (44409190), 113,01 (44208186)] # #------------------------------------------------------------- *# FloatMathCeil_f:* run duration: 5 000 ms *float = FloatMath.ceil_f(f)* 1 threads, Tavg = 85,42 ns/op (σ = 0,00 ns/op), Total ops = 58534818 [ 85,42 (58534818)] 2 threads, Tavg = 85,56 ns/op (σ = 0,18 ns/op), Total ops = 116880361 [ 85,74 (58318655), 85,38 (58561706)] 3 threads, Tavg = 85,49 ns/op (σ = 0,11 ns/op), Total ops = 175469910 [ 85,64 (58386401), 85,42 (58535723), 85,40 (58547786)] * 4 threads, Tavg = 86,10 ns/op (σ = 0,86 ns/op), *Total ops = 232739544 [ 87,59 (57200792), 85,61 (58519538), 85,47 (58617544), 85,79 (58401670)] # #------------------------------------------------------------- *# FloatMathCeil:* run duration: 5 000 ms *int = FloatMath.ceil(f)* 1 threads, Tavg = 56,72 ns/op (σ = 0,00 ns/op), Total ops = 88153017 [ 56,72 (88153017)] 2 threads, Tavg = 56,90 ns/op (σ = 0,16 ns/op), Total ops = 175737994 [ 57,06 (87626873), 56,75 (88111121)] 3 threads, Tavg = 56,82 ns/op (σ = 0,15 ns/op), Total ops = 264003134 [ 57,02 (87684429), 56,76 (88087214), 56,67 (88231491)] * 4 threads, Tavg = 57,16 ns/op (σ = 0,57 ns/op),* Total ops = 350060098 [ 58,12 (86072473), 56,74 (88161260), 56,68 (88251450), 57,12 (87574915)] # #------------------------------------------------------------- *# StrictMathFloor_f:* run duration: 5 000 ms *float = (float) StrictMath.floor(f)* 1 threads, Tavg = 108,69 ns/op (σ = 0,00 ns/op), Total ops = 46005419 [ 108,69 (46005419)] 2 threads, Tavg = 108,87 ns/op (σ = 0,25 ns/op), Total ops = 91856264 [ 109,11 (45824174), 108,62 (46032090)] 3 threads, Tavg = 108,66 ns/op (σ = 0,01 ns/op), Total ops = 138046291 [ 108,65 (46019660), 108,68 (46008068), 108,65 (46018563)] * 4 threads, Tavg = 109,99 ns/op (σ = 1,00 ns/op),* Total ops = 182162538 [ 111,63 (44870853), 109,77 (45631259), 108,90 (45994047), 109,69 (45666379)] # #------------------------------------------------------------- *# FloatMathFloor_f: *run duration: 5 000 ms *float = FloatMath.floor_f(f)* 1 threads, Tavg = 79,60 ns/op (σ = 0,00 ns/op), Total ops = 62816917 [ 79,60 (62816917)] 2 threads, Tavg = 79,44 ns/op (σ = 0,15 ns/op), Total ops = 125890579 [ 79,58 (62827873), 79,29 (63062706)] 3 threads, Tavg = 79,38 ns/op (σ = 0,15 ns/op), Total ops = 188968096 [ 79,59 (62823628), 79,23 (63107367), 79,32 (63037101)] * 4 threads, Tavg = 79,88 ns/op (σ = 0,83 ns/op),* Total ops = 250828233 [ 81,31 (61604026), 79,60 (62930953), 79,32 (63149634), 79,33 (63143620)] # #------------------------------------------------------------- *# FloatMathFloor:* run duration: 5 000 ms *float = FloatMath.floor(f)* 1 threads, Tavg = 70,20 ns/op (σ = 0,00 ns/op), Total ops = 71226367 [ 70,20 (71226367)] 2 threads, Tavg = 70,35 ns/op (σ = 0,16 ns/op), Total ops = 142141053 [ 70,51 (70910131), 70,20 (71230922)] 3 threads, Tavg = 70,26 ns/op (σ = 0,08 ns/op), Total ops = 213504247 [ 70,20 (71225449), 70,38 (71046834), 70,19 (71231964)] * 4 threads, Tavg = 70,67 ns/op (σ = 0,60 ns/op), *Total ops = 283376128 [ 70,24 (71279973), 70,58 (70931050), 70,20 (71320272), 71,68 (69844833)] #