Jim,
2015-08-25 0:08 GMT+02:00 Jim Graham <james.gra...@oracle.com>: > Hi Laurent, > > I'm feeling much better this week (and can even start typing with my bad > hand in short spurts now), so I'll hopefully be more on the ball now. > Great ! Hope your hand is not too bad. > > Short version - looks good to go, consider the following input at your own > discretion and go ahead and push. > Excellent. Here are my comments: > > Reviewing s3.4: > > On 8/17/15 2:34 PM, Laurent Bourgès wrote: > >> Changes: >> - Renderer: remove unused D_ERR_STEP_MAX and 0x1 replaced by 1 for >> readability >> > > Renderer, line 319 - it makes sense to me that the scale changes would be > based off of the same metric. Basically the constraint is keeping the > speed (dxy) between INC and DEC bounds. I'll admit, though, that I haven't > been through the actual derivations of the math here, but I thought I'd > give my 2 cents on your comment there. > I left a comment as changing the metric will require adjusting the bounds + tests (postponed later). According to my understandings, the error between the linear segment and the curve is <= 8 x | acceleration(t) | (Graphics Gem I) so I would prefer using the ddx|y. Besides, I would use th squared norm = ddx*ddx + ddy*ddy instead of abs(ddx). I think the speed dx|y describes the curve slope so it does not correspond to the approximation error, but it seems you already use such constraints in another renderer (ProcessPath.c). Another point: filled curves (handled by Renderer directly) are not monotonic so this error approximation is then not valid ! Maybe we should split the curve as the Stroker does (at cups, inflexion points ...) > Renderer, line 479 - I often put extra parens in for cases like this > because it isn't common knowledge whether cast or add is higher in > precedence. I generally assume no better knowledge than the basic PEMDAS > rules that we learn in grade school. But that's just my style (I also don't > have the OOO table memorized, but maybe that's just my senility ;). > Done. > 0x1 vs 1 - I usually use 0x1 when I'm using numbers for their bit > patterns, so | and & typically use 0x constants in my style philosophy. In > the case of "1" it's a degenerate case that doesn't seem to matter, but the > added 0x tells me to think in terms of bits, not cardinal values. > Fixed > > Renderer, note for future investigation - we store the direction flag in > Y_MAX, but then we need to do manipulation to process Y_MAX and we also > need more manipulation to transfer that bit to crossing values. What if > instead we stored it in the lowest bit of curx? We'd then have 31.31 fixed > point, we'd have to process slope so that its whole part was doubled (to > skip affecting the LSbit of curx) and we'd have to do the carry using > "((error >> 31) << 1)", but we'd get the transfer of the direction bit to > the crossing for free and all in all, the total operations may be reduced > (at the cost of 1 bit of X coordinate range). > I seems a very good idea to evaluate soon: it will save several shift operations (+ 1 Unsafe access). I would also make error, bump_x, bump_error as 31.31 FP (LSB=0) so it would be consistent and may simplify even more computations. Hope it is good now, I plan to work soon on the Array Cache & dispose code. >> > > Looks great! Consider the style issues I mentioned above if you wish and > otherwise it is good to go. I don't need to see any further webrevs on > those issues. > Thanks. I started working on RLE / pixel coverage encoding but still expect to improve Array Cache next. > > One more suggestion for optimization below can be considered in subsequent > work: > > For CHECK_NAN, how fast is Math.isNaN? Is it worth first comparing >> the intpart to the value that NaN converts to before calling isNaN? >> The majority case, by a wide margin, is that the value is not NaN. >> >> >> That's the aim: I test first the most probable case (a <= intpart) for >> ceil() and then check possible overflow and finally check for NaN values >> to avoid adding / substracting 1 ! >> I made benchmarks and this implementation optimized for the integer >> domain works very well: 2x faster. >> > > I guess I wasn't clear. I understand why the 3 parts of the test are in > the order that they are in, but I was referring only to the "CHECK_NAN && > Float.isNaN(a)" clause. Is that worth adding one more test for "CHECK_NAN > && intpart == NAN_TO_INT_VAL && Float.isNaN(a)" in that one clause? > > For the ceil_int case, a very common case is that a is non-negative and > non-integer. In that case, "a <= intpart" will fail and you will do the > CHECK_NAN test before you can return "intpart+1". > > For the floor_int case, you would only get to the CHECK_NAN case on > negative-non-integers which we don't expect a lot of, but in general, the > added test might make it slightly faster. > I quickly tried but sorry it provides no gains: Float.isNaN() is just implemented as (a != a) so it is faster with only 1 condition (a != a) than 2 (intval == 0 && (a != a)). But the more important question may be around how much benefit these >> bring compared to the additional testing overhead of so many >> combinations ? >> >> >> I advocate it is not easy to test all combinations but it seems not a >> reasonable testing objective. >> >> As I said, it is mainly tuning or quality settings but only few are >> important (subpixels X / Y, tile size, initial pixel size). >> > > Another thing to consider is that customers would have to do lots of > testing of performance of tweaking these values and so they would discover > the problems and report them to us before they become reliant on them. > However, down the road they may accept version+1 of Marlin into their > existing environment without fully testing and we may introduce a bug that > might take them by surprise. > Let's discuss / review that production aspect when Marlin will pass tests & benchmarks. I think several tuning settings may become useless at that time. Laurent