Jim,
Here is an updated webrev:
http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.1/
Changes:
- enabled CHECK_NAN and CHECK_OVERFLOW to be correct for now
- renamed faster alternatives as int ceil_int(float) and float
floor_int(float) that are faster in the integer domain
- restored ceil_f / floor_f (float) methods that are strictly correct as
(float) StrictMath.ceil/floor(double)
- made FloatMath class and its methods public to be available for tests
and maybe more general use in graphics / java2d ...
It is still faster than previous FloatMath and Marlin is a bit faster too:
see results at then end !
Here are few comments on joe's proposal:
>> I could propose my implementations of float ceil/floor (float) that are
>> exactly giving the same results than (float)StrictMath.ceil/floor
(double).
>> According to my benchmarks, it is 25% faster.
>
>
> I don't think we need to limit ourselves to either StrictMath or Math. We
simply need something predictable that has properties which work for our needs.
>
I was just proposing the 2 methods float ceil/floor (float) (derived
from StrictMath) to be included the core libs if it is useful for
general use (25% faster).
Joe, are you interested by ceil_f / floor_f variants (25% faster than
StrictMath) ?
So, you can *almost* get away with
int ceil_returning_int(floor f) {
if (f > 0.0)
return - ((int)(-f))
else
return (int) f;
}
int floor_returning_int(floor f) {
if (f < 0.0)
return - ((int)(-f))
else
return (int) f;
}
I tried joe's proposal but it does not work:
Round to zero is not equivalent to ceil or floor !
In what way do Joe's techniques fail? Integer casts should be a truncate operation (is
that what you refer to as "round to zero"?) and should be the same as floor()
for non-negative numbers and -((int)(-v)) should be the same as floor for negative
numbers...
I tried and it does not work
ceil (1.2)=2
But (int)(-1.2)=-1 (round to zero).
So the result is 1 and not 2 !
That's why my variant adds/substract 1 !
But it make infinity / nan handling more painful and a bit costly.
Jim, I will next make tests:
1/ use proper and consistent ceil(coord - 0.5) as you did in openpisces (FX)
2/ use fixed point approach (longer work) to only use integer maths in
Marlin rendering loop (crossings)
=> faster (no float to int conversions ?) but also more scalable on
hyperThreading CPU ?
=> edge array will then only contain int[] and Unsafe usage is no more
necessary
Cheers,
Laurent
PS: Here are some benchmark results made on values only in the integer
domain:
JVM START: 1.8.0_60-ea [Java HotSpot(TM) 64-Bit Server VM 25.60-b18]
floats = [-2.13422758E9, -1.37992608E8, -134758.4, -131.5, -17.2, -1.9,
-0.9, -1.0E-4, -1.0E-8, -1.0E-23, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.5, 17.2, 1.9, 0.9, 1.0E-4, 1.0E-8, 1.0E-23,
2.13422758E9, 1.37992608E8, 134758.4]
strictMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]
floatMathCeil = [-2134227584, -137992608, -134758, -131, -17, -1, 0,
0, 0, 0, -100, -3, -1, 0, 0, 0, 1, 3, 100, 132, 18, 2, 1, 1, 1, 1,
2134227584, 137992608, 134759]
FloatMathCeil_f = [-2.13422758E9, -1.37992608E8, -134758.0, -131.0,
-17.0, -1.0, -0.0, -0.0, -0.0, -0.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 132.0, 18.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.13422758E9,
1.37992608E8, 134759.0]
strictMathFloor_f = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, -0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
floatMathFloor_f = [-2.13422758E9, -1.37992608E8, -134759.0, -132.0,
-18.0, -2.0, -1.0, -1.0, -1.0, -1.0, -100.0, -3.0, -1.0, 0.0, 0.0, 0.0,
1.0, 3.0, 100.0, 131.0, 17.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2.13422758E9,
1.37992608E8, 134758.0]
- Benchmarks ---
# Calib: run duration: 5 000 ms
4 threads, Tavg = 3,03 ns/op (σ = 0,02 ns/op), Total ops =
6616663415 [ 3,07 (1633532581), 3,02 (1656839291), 3,01
(1662195896), 3,01 (1664095647)]
#
#-------------------------------------------------------------
*# StrictMathCeil_f*: run duration: 5 000 ms
**float = *(float) StrictMath.ceil(f)
*
1 threads, Tavg = 112,46 ns/op (σ = 0,00 ns/op), Total
ops = 44462614 [ 112,46 (44462614)]
2 threads, Tavg = 112,53 ns/op (σ = 0,20 ns/op), Total ops =
88864503 [ 112,74 (44351706), 112,33 (44512797)]
3 threads, Tavg = 112,75 ns/op (σ = 0,31 ns/op), Total
ops = 133042189 [ 112,67 (44379882), 112,42 (44478562), 113,17
(44183745)]
* 4 threads, Tavg = 113,61 ns/op (σ = 1,18 ns/op),*
Total ops = 176004512 [ 115,63 (43242922), 113,27 (44144214),
112,59 (44409190), 113,01 (44208186)]
#
#-------------------------------------------------------------
*# FloatMathCeil_f:* run duration: 5 000 ms
*float = FloatMath.ceil_f(f)
*
1 threads, Tavg = 85,42 ns/op (σ = 0,00 ns/op), Total
ops = 58534818 [ 85,42 (58534818)]
2 threads, Tavg = 85,56 ns/op (σ = 0,18 ns/op), Total ops =
116880361 [ 85,74 (58318655), 85,38 (58561706)]
3 threads, Tavg = 85,49 ns/op (σ = 0,11 ns/op), Total
ops = 175469910 [ 85,64 (58386401), 85,42 (58535723), 85,40
(58547786)]
* 4 threads, Tavg = 86,10 ns/op (σ = 0,86 ns/op),
*Total ops = 232739544 [ 87,59 (57200792), 85,61
(58519538), 85,47 (58617544), 85,79 (58401670)]
#
#-------------------------------------------------------------
*# FloatMathCeil:* run duration: 5 000 ms
*int = FloatMath.ceil(f)*
1 threads, Tavg = 56,72 ns/op (σ = 0,00 ns/op), Total
ops = 88153017 [ 56,72 (88153017)]
2 threads, Tavg = 56,90 ns/op (σ = 0,16 ns/op), Total ops =
175737994 [ 57,06 (87626873), 56,75 (88111121)]
3 threads, Tavg = 56,82 ns/op (σ = 0,15 ns/op), Total
ops = 264003134 [ 57,02 (87684429), 56,76 (88087214), 56,67
(88231491)]
* 4 threads, Tavg = 57,16 ns/op (σ = 0,57 ns/op),*
Total ops = 350060098 [ 58,12 (86072473), 56,74
(88161260), 56,68 (88251450), 57,12 (87574915)]
#
#-------------------------------------------------------------
*# StrictMathFloor_f:* run duration: 5 000 ms
*float = (float) StrictMath.floor(f)*
1 threads, Tavg = 108,69 ns/op (σ = 0,00 ns/op), Total
ops = 46005419 [ 108,69 (46005419)]
2 threads, Tavg = 108,87 ns/op (σ = 0,25 ns/op), Total ops =
91856264 [ 109,11 (45824174), 108,62 (46032090)]
3 threads, Tavg = 108,66 ns/op (σ = 0,01 ns/op), Total
ops = 138046291 [ 108,65 (46019660), 108,68 (46008068), 108,65
(46018563)]
* 4 threads, Tavg = 109,99 ns/op (σ = 1,00 ns/op),*
Total ops = 182162538 [ 111,63 (44870853), 109,77 (45631259),
108,90 (45994047), 109,69 (45666379)]
#
#-------------------------------------------------------------
*# FloatMathFloor_f: *run duration: 5 000 ms
*float = FloatMath.floor_f(f)*
1 threads, Tavg = 79,60 ns/op (σ = 0,00 ns/op), Total
ops = 62816917 [ 79,60 (62816917)]
2 threads, Tavg = 79,44 ns/op (σ = 0,15 ns/op), Total ops =
125890579 [ 79,58 (62827873), 79,29 (63062706)]
3 threads, Tavg = 79,38 ns/op (σ = 0,15 ns/op), Total
ops = 188968096 [ 79,59 (62823628), 79,23 (63107367), 79,32
(63037101)]
* 4 threads, Tavg = 79,88 ns/op (σ = 0,83 ns/op),*
Total ops = 250828233 [ 81,31 (61604026), 79,60
(62930953), 79,32 (63149634), 79,33 (63143620)]
#
#-------------------------------------------------------------
*# FloatMathFloor:* run duration: 5 000 ms
*float = FloatMath.floor(f)*
1 threads, Tavg = 70,20 ns/op (σ = 0,00 ns/op), Total
ops = 71226367 [ 70,20 (71226367)]
2 threads, Tavg = 70,35 ns/op (σ = 0,16 ns/op), Total ops =
142141053 [ 70,51 (70910131), 70,20 (71230922)]
3 threads, Tavg = 70,26 ns/op (σ = 0,08 ns/op), Total
ops = 213504247 [ 70,20 (71225449), 70,38 (71046834), 70,19
(71231964)]
* 4 threads, Tavg = 70,67 ns/op (σ = 0,60 ns/op),
*Total ops = 283376128 [ 70,24 (71279973), 70,58
(70931050), 70,20 (71320272), 71,68 (69844833)]
#