Re: [OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Jim Graham Mon, 24 Aug 2015 15:09:13 -0700

Hi Laurent,

I'm feeling much better this week (and can even start typing with my badhand in short spurts now), so I'll hopefully be more on the ball now.

Short version - looks good to go, consider the following input at yourown discretion and go ahead and push.


Reviewing s3.4:

On 8/17/15 2:34 PM, Laurent Bourgès wrote:

Changes:
- Renderer: remove unused D_ERR_STEP_MAX and 0x1 replaced by 1 for
readability

Renderer, line 319 - it makes sense to me that the scale changes wouldbe based off of the same metric. Basically the constraint is keepingthe speed (dxy) between INC and DEC bounds. I'll admit, though, that Ihaven't been through the actual derivations of the math here, but Ithought I'd give my 2 cents on your comment there.

Renderer, line 479 - I often put extra parens in for cases like thisbecause it isn't common knowledge whether cast or add is higher inprecedence. I generally assume no better knowledge than the basicPEMDAS rules that we learn in grade school. But that's just my style (Ialso don't have the OOO table memorized, but maybe that's just mysenility ;).

0x1 vs 1 - I usually use 0x1 when I'm using numbers for their bitpatterns, so | and & typically use 0x constants in my style philosophy.In the case of "1" it's a degenerate case that doesn't seem to matter,but the added 0x tells me to think in terms of bits, not cardinal values.

Renderer, note for future investigation - we store the direction flag inY_MAX, but then we need to do manipulation to process Y_MAX and we alsoneed more manipulation to transfer that bit to crossing values. What ifinstead we stored it in the lowest bit of curx? We'd then have 31.31fixed point, we'd have to process slope so that its whole part wasdoubled (to skip affecting the LSbit of curx) and we'd have to do thecarry using "((error >> 31) << 1)", but we'd get the transfer of thedirection bit to the crossing for free and all in all, the totaloperations may be reduced (at the cost of 1 bit of X coordinate range).

- MarlinRenderingEngine: optimized loop in pathTo()


Looks good

- Dasher: TODO remains related to dash length issue

OK.

Hope it is good now, I plan to work soon on the Array Cache & dispose code.

Looks great! Consider the style issues I mentioned above if you wishand otherwise it is good to go. I don't need to see any further webrevson those issues.

One more suggestion for optimization below can be considered insubsequent work:

    For CHECK_NAN, how fast is Math.isNaN?  Is it worth first comparing
    the intpart to the value that NaN converts to before calling isNaN?
    The majority case, by a wide margin, is that the value is not NaN.


That's the aim: I test first the most probable case (a <= intpart) for
ceil() and then check possible overflow and finally check for NaN values
to avoid adding / substracting 1 !
I made benchmarks and this implementation optimized for the integer
domain works very well: 2x faster.

I guess I wasn't clear. I understand why the 3 parts of the test are inthe order that they are in, but I was referring only to the "CHECK_NAN&& Float.isNaN(a)" clause. Is that worth adding one more test for"CHECK_NAN && intpart == NAN_TO_INT_VAL && Float.isNaN(a)" in that oneclause?

For the ceil_int case, a very common case is that a is non-negative andnon-integer. In that case, "a <= intpart" will fail and you will do theCHECK_NAN test before you can return "intpart+1".

For the floor_int case, you would only get to the CHECK_NAN case onnegative-non-integers which we don't expect a lot of, but in general,the added test might make it slightly faster.

    I was under the understanding that if a constant is declared final
    and static, then it is compiled directly into the code and there is
    no constant lookup.  Check the bytecodes and you should see a
    hard-coded literal 0x7fffffff everywhere it is used...


I agree but it seems slightly faster: probably loading the literal is a
bit slower than loading from the stack ?
Maybe some hotspot experts could give their opinion.
As this is the renderer hot loop, I prefer making it as fast as possible.

In the case of 0x7fffffff it may be because of the size of the constant.A smaller literal like "1" or "2" might be faster as an in-linedsource literal.

    I think that at least for now they should either be
    sun.java2d.renderer.marlin.* or
    just sun.java2d.marlin.*


Ok, I can rename them in the next webrev as it requires me to upgrade
all my testing & benchmarking scripts...

Sounds fine to me. This is more "something we need to finalize beforeintegrating into the master workspace" than an issue for any given webrev.

    But the more important question may be around how much benefit these
    bring compared to the additional testing overhead of so many
    combinations ?


I advocate it is not easy to test all combinations but it seems not a
reasonable testing objective.

As I said, it is mainly tuning or quality settings but only few are
important (subpixels X / Y, tile size, initial pixel size).

Another thing to consider is that customers would have to do lots oftesting of performance of tweaking these values and so they woulddiscover the problems and report them to us before they become relianton them. However, down the road they may accept version+1 of Marlininto their existing environment without fully testing and we mayintroduce a bug that might take them by surprise.

(Again, something to discuss and decide in the long term - not for thisparticular patch)...


                        ...jim

Re: [OpenJDK Rasterizer] Fwd: Re: Fwd: RFR: Marlin renderer #3

Reply via email to