Jim, Here is the new webrev including your proposals to use 32.31 fixed-point maths and double-precision in addLine() as it is definitely better: http://cr.openjdk.java.net/~lbourges/marlin/marlin-s3.3/
I also fixed all findbugs warnings or added a comment / TODO to few of them. Changes: - ArrayCache: fixed concurrency issue with stats (added incXXX methods) - Curve: return instead of break in switch cases, added missing default cases, renamed field Ts to ts - Dasher: return instead of break in switch cases, added missing default cases * added "TODO: compare float values using epsilon:": to be discussed* - FloatMath: CHECK_OVERFLOW and CHECK_NAN enabled, removed ceil() and floor() methods, added powerOfTwoD(int) copied from Math - Helpers: return instead of break in switch cases, added missing default cases - MarlinCache: moved few constants to MarlinConst to fix circular dependency - MarlinProperties: new class to get marlin system properties (code from MarlinRenderingEngine) - MarlinConst: use the new MarlinProperties class to fix circular dependency moved several constants used by several classes to fix circular dependency (SUBPIXEL_LG_POSITIONS_X, SUBPIXEL_LG_POSITIONS_Y, SUBPIXEL_POSITIONS_X, SUBPIXEL_POSITIONS_Y, NORM_SUBPIXELS, MAX_AA_ALPHA, TILE_SIZE_LG, TILE_SIZE) - MarlinRenderingEngine: Improved getNormalizingPathIterator() to return the original path iterator if normalization is disabled that simplifies the 4 usages Improved thin stoke handling: getMinimumAAPenSize() returns MIN_PEN_SIZE ~ 1/8f added missing default cases in switch cases moved code dealing with marlin system properties in MarlinProperties - Renderer: removed fractToInt() added double POWER_2_TO_32 = FloatMath.powerOfTwoD(32) used to compute scalb(double, 32) inlined for performance removed several constants in MarlinConst added comments about cubic / quad inc/dec bind lengths: QUAD_COUNT_LG removed fixed initial count = 1 in quadBreakIntoLinesAndAdd and maxDD (missing abs calls) improved addLine(): added comments, use double-precision for improved accuracy and fixed point 32.32 format - RendererContext: use ArrayCache incXXX methods - RendererStats: fixed concurrency warning related to the RendererStats singleton field - Stroker: added missing default cases in switch cases - TransformingPathConsumer2D: fixed field names (lower case as expected by findbugs) - Version: increased version to 0.7.0 My answers below: 2015-07-29 5:11 GMT+02:00 Jim Graham <james.gra...@oracle.com>: > > NormalizingIterator.NearestPixel should probably be renamed > NearestPixelQuarter to reflect its functionality (and > RenderCtx.nPQPathIterator). > done. > getNormalizing() - why not just return src for mode OFF? That way you > don't even have to test in the calling function, though it does add an > extra method call in the non-normalizing case, it simplifies the code - and > hopefully the method is simple enough to be inlined. > done. > > Renderer.java, line 45: fractToInt() is never used. > removed. > > Renderer.java, line 203: typo in comment -> 96K? > fixed. > > Renderer.java, line 379: Instead, mention that y values are already > shifted by -0.5 in tosubpixy()? > (The "note" comment makes it appear that that is work that is left to be > done, but it is already done) > I've taken your comments in your more recent email. > > Renderer.java, line 461: What happens if x1_cor was already at VPC? The > error should be 0, but you end up setting it to ERR_STEP_MAX. > Fixed using your 32.31 approach using double-precision maths (slope) as it seems slightly faster ~ 1%. " final double x1d = x1; final double y1d = y1; final double slope = (x2 - x1d) / (y2 - y1d); ... final double x1_intercept = x1d + (firstcrossing - y1d) * slope; final long x1_fixed_biased = ((long) scalb(x1_intercept, 32)) + 0x7fffffffL; store(CURX, (int) (x1_fixed_biased >> 32)); store(ERRX, ((int) x1_fixed_biased) >>> 1); .... final long slope_fixed = (long) scalb(slope, 32); store(BUMPX, (int) (slope_fixed >> 32)); store(BUMPXERR, ((int) slope_fixed) >>> 1); " > Renderer.java, line 792: Do you need to copy ERR_STEP_MAX into a local? > It is a statically initialized final int - that should already be a > constant that gets copied into the code. > I always copy constants used in loops to avoid repeated constant lookups. > Renderer.java, line 1350-ish: mention in here somewhere that edgeMinMaxY > values have already been adjusted by -0.5 in tosubpixy(), but edgeMinMaxX > values have not yet been adjusted? (It took me a moment to realize that.) > done. > > CeilAndFloorTests - whoa, hex floating point literals? That's a new one > for me... ;) > > I'm guessing that p22 has 1 bit of fractional mantissa and p23 has no > fractional mantissa digits? > Yes; I got it from an existing test : /jdk/test/java/lang/Math/CeilAndFloorTests.java but adapted to float constants. Sorry for my late reply, but it took me some time to make all changes, test, benchmark and tune the code. Cheers, Laurent PS: Here is the latest Marlin results (OpenJDK9 but not up-to-date) that I hope we will confirm soon with J2DBench MT: Test Threads Ops Med Pct95 Avg StdDev Min Max TotalOps [ms/op] dc_boulder_2013-13-30-06-13-17.ser 1 115 90.712 91.734 90.871 0.426 90.253 92.586 115 dc_boulder_2013-13-30-06-13-17.ser 2 230 91.540 91.736 91.544 0.131 91.211 92.278 230 dc_boulder_2013-13-30-06-13-17.ser 4 460 92.652 92.807 92.647 0.108 92.234 93.074 460 dc_boulder_2013-13-30-06-13-20.ser 1 217 47.605 48.383 47.726 0.313 47.394 48.468 217 dc_boulder_2013-13-30-06-13-20.ser 2 434 49.083 49.813 48.928 0.497 48.080 50.125 434 dc_boulder_2013-13-30-06-13-20.ser 4 868 49.726 50.255 49.631 0.524 48.741 55.242 868 dc_shp_alllayers_2013-00-30-07-00-43.ser 1 251 39.892 40.081 39.879 0.131 39.543 40.186 251 dc_shp_alllayers_2013-00-30-07-00-43.ser 2 502 40.873 41.588 40.899 0.604 39.935 41.773 502 dc_shp_alllayers_2013-00-30-07-00-43.ser 4 1004 41.578 42.974 41.732 0.792 40.440 43.934 1004 dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 776.726 776.985 776.729 0.177 776.365 777.169 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 784.572 785.425 784.423 0.836 781.791 785.673 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 784.838 789.145 785.661 2.856 783.250 799.128 100 dc_spearfish_2013-11-30-06-11-15.ser 1 819 12.785 12.867 12.799 0.044 12.756 13.272 819 dc_spearfish_2013-11-30-06-11-15.ser 2 1638 12.868 12.950 12.866 0.054 12.805 13.323 1638 dc_spearfish_2013-11-30-06-11-15.ser 4 3276 12.915 13.008 12.921 0.065 12.855 13.600 3276 dc_spearfish_2013-11-30-06-11-19.ser 1 1593 6.574 6.826 6.613 0.088 6.557 7.008 1593 dc_spearfish_2013-11-30-06-11-19.ser 2 3186 6.619 6.856 6.631 0.086 6.545 7.021 3186 dc_spearfish_2013-11-30-06-11-19.ser 4 6372 6.629 6.881 6.657 0.094 6.573 8.548 6372 dc_topp:states_2013-11-30-06-11-06.ser 1 845 12.418 12.483 12.426 0.029 12.329 12.600 845 dc_topp:states_2013-11-30-06-11-06.ser 2 1690 12.464 12.504 12.461 0.027 12.343 12.615 1690 dc_topp:states_2013-11-30-06-11-06.ser 4 3380 12.460 12.547 12.465 0.084 12.346 16.590 3380 dc_topp:states_2013-11-30-06-11-07.ser 1 1382 7.574 7.636 7.578 0.025 7.484 7.963 1382 dc_topp:states_2013-11-30-06-11-07.ser 2 2764 7.592 7.620 7.591 0.026 7.447 7.725 2764 dc_topp:states_2013-11-30-06-11-07.ser 4 5528 7.601 7.635 7.601 0.029 7.475 8.238 5528 test_z_625k.ser 1 63 163.317 163.761 163.372 0.198 163.149 164.151 63 test_z_625k.ser 2 126 165.225 165.391 165.241 0.092 165.028 165.704 126 test_z_625k.ser 4 252 166.399 166.602 166.417 0.140 166.076 167.473 252 Scores: Tests 27 9 9 9 Threads 4 1 2 4 Pct95 130.241 128.973 130.432 131.317 Performance and scalability is very good and always faster than ductus: Test Threads Ops Med Pct95 Avg StdDev Min Max TotalOps [ms/op] dc_boulder_2013-13-30-06-13-17.ser 1 93 111.932 112.526 111.971 0.315 111.268 112.869 93 dc_boulder_2013-13-30-06-13-17.ser 2 186 163.722 167.732 164.146 2.335 149.394 181.485 186 dc_boulder_2013-13-30-06-13-17.ser 4 372 306.995 319.968 311.124 29.269 244.478 547.289 372 dc_boulder_2013-13-30-06-13-20.ser 1 188 56.128 56.771 56.194 0.294 55.684 57.111 188 dc_boulder_2013-13-30-06-13-20.ser 2 376 94.648 95.344 94.677 0.777 93.045 103.992 376 dc_boulder_2013-13-30-06-13-20.ser 4 752 181.442 187.893 182.491 13.768 115.947 392.881 752 dc_shp_alllayers_2013-00-30-07-00-43.ser 1 219 48.030 48.336 48.003 0.220 47.441 48.697 219 dc_shp_alllayers_2013-00-30-07-00-43.ser 2 438 72.272 74.389 72.496 1.023 70.410 78.780 438 dc_shp_alllayers_2013-00-30-07-00-43.ser 4 876 136.748 144.031 136.785 4.872 120.915 169.034 876 dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 1032.341 1041.777 1032.706 6.047 1019.945 1044.672 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 1552.967 1596.590 1550.010 31.675 1484.548 1604.336 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 3714.267 4140.517 3765.731 239.272 2611.855 4231.175 100 dc_spearfish_2013-11-30-06-11-15.ser 1 617 16.671 17.103 16.691 0.210 16.351 17.269 617 dc_spearfish_2013-11-30-06-11-15.ser 2 1234 24.840 25.503 24.862 1.226 23.007 42.378 1234 dc_spearfish_2013-11-30-06-11-15.ser 4 2468 47.588 50.606 49.165 11.989 16.473 170.363 2468 dc_spearfish_2013-11-30-06-11-19.ser 1 1388 7.685 7.876 7.685 0.132 7.448 7.975 1388 dc_spearfish_2013-11-30-06-11-19.ser 2 2776 11.604 11.970 11.587 0.660 10.692 27.675 2776 dc_spearfish_2013-11-30-06-11-19.ser 4 5552 21.594 24.148 21.926 2.862 7.449 78.296 5552 dc_topp:states_2013-11-30-06-11-06.ser 1 625 16.883 16.969 16.888 0.042 16.733 17.023 625 dc_topp:states_2013-11-30-06-11-06.ser 2 1250 21.450 21.939 21.451 0.457 16.804 28.310 1250 dc_topp:states_2013-11-30-06-11-06.ser 4 2500 36.366 38.736 36.475 2.401 20.357 82.360 2500 dc_topp:states_2013-11-30-06-11-07.ser 1 942 11.117 11.186 11.113 0.057 10.947 11.626 942 dc_topp:states_2013-11-30-06-11-07.ser 2 1884 13.785 14.154 13.807 0.565 11.128 29.540 1884 dc_topp:states_2013-11-30-06-11-07.ser 4 3768 21.545 22.764 21.691 3.340 11.165 121.082 3768 test_z_625k.ser 1 50 211.271 214.208 211.679 1.548 208.421 215.475 50 test_z_625k.ser 2 100 345.324 356.823 347.145 6.430 335.767 369.519 100 test_z_625k.ser 4 200 680.916 948.729 711.352 83.520 616.722 978.978 200 Scores: Tests 27 9 9 9 Threads 4 1 2 4 Pct95 361.800 169.639 262.716 653.043