Hi Laurent,

I'm feeling much better this week (and can even start typing with my bad hand in short spurts now), so I'll hopefully be more on the ball now.

Short version - looks good to go, consider the following input at your own discretion and go ahead and push.

Reviewing s3.4:

On 8/17/15 2:34 PM, Laurent Bourgès wrote:
Changes:
- Renderer: remove unused D_ERR_STEP_MAX and 0x1 replaced by 1 for
readability

Renderer, line 319 - it makes sense to me that the scale changes would be based off of the same metric. Basically the constraint is keeping the speed (dxy) between INC and DEC bounds. I'll admit, though, that I haven't been through the actual derivations of the math here, but I thought I'd give my 2 cents on your comment there.

Renderer, line 479 - I often put extra parens in for cases like this because it isn't common knowledge whether cast or add is higher in precedence. I generally assume no better knowledge than the basic PEMDAS rules that we learn in grade school. But that's just my style (I also don't have the OOO table memorized, but maybe that's just my senility ;).

0x1 vs 1 - I usually use 0x1 when I'm using numbers for their bit patterns, so | and & typically use 0x constants in my style philosophy. In the case of "1" it's a degenerate case that doesn't seem to matter, but the added 0x tells me to think in terms of bits, not cardinal values.

Renderer, note for future investigation - we store the direction flag in Y_MAX, but then we need to do manipulation to process Y_MAX and we also need more manipulation to transfer that bit to crossing values. What if instead we stored it in the lowest bit of curx? We'd then have 31.31 fixed point, we'd have to process slope so that its whole part was doubled (to skip affecting the LSbit of curx) and we'd have to do the carry using "((error >> 31) << 1)", but we'd get the transfer of the direction bit to the crossing for free and all in all, the total operations may be reduced (at the cost of 1 bit of X coordinate range).

- MarlinRenderingEngine: optimized loop in pathTo()

Looks good

- Dasher: TODO remains related to dash length issue

OK.

Hope it is good now, I plan to work soon on the Array Cache & dispose code.

Looks great! Consider the style issues I mentioned above if you wish and otherwise it is good to go. I don't need to see any further webrevs on those issues.

One more suggestion for optimization below can be considered in subsequent work:

    For CHECK_NAN, how fast is Math.isNaN?  Is it worth first comparing
    the intpart to the value that NaN converts to before calling isNaN?
    The majority case, by a wide margin, is that the value is not NaN.


That's the aim: I test first the most probable case (a <= intpart) for
ceil() and then check possible overflow and finally check for NaN values
to avoid adding / substracting 1 !
I made benchmarks and this implementation optimized for the integer
domain works very well: 2x faster.

I guess I wasn't clear. I understand why the 3 parts of the test are in the order that they are in, but I was referring only to the "CHECK_NAN && Float.isNaN(a)" clause. Is that worth adding one more test for "CHECK_NAN && intpart == NAN_TO_INT_VAL && Float.isNaN(a)" in that one clause?

For the ceil_int case, a very common case is that a is non-negative and non-integer. In that case, "a <= intpart" will fail and you will do the CHECK_NAN test before you can return "intpart+1".

For the floor_int case, you would only get to the CHECK_NAN case on negative-non-integers which we don't expect a lot of, but in general, the added test might make it slightly faster.

    I was under the understanding that if a constant is declared final
    and static, then it is compiled directly into the code and there is
    no constant lookup.  Check the bytecodes and you should see a
    hard-coded literal 0x7fffffff everywhere it is used...


I agree but it seems slightly faster: probably loading the literal is a
bit slower than loading from the stack ?
Maybe some hotspot experts could give their opinion.
As this is the renderer hot loop, I prefer making it as fast as possible.

In the case of 0x7fffffff it may be because of the size of the constant. A smaller literal like "1" or "2" might be faster as an in-lined source literal.

    I think that at least for now they should either be
    sun.java2d.renderer.marlin.* or
    just sun.java2d.marlin.*


Ok, I can rename them in the next webrev as it requires me to upgrade
all my testing & benchmarking scripts...

Sounds fine to me. This is more "something we need to finalize before integrating into the master workspace" than an issue for any given webrev.

    But the more important question may be around how much benefit these
    bring compared to the additional testing overhead of so many
    combinations ?


I advocate it is not easy to test all combinations but it seems not a
reasonable testing objective.

As I said, it is mainly tuning or quality settings but only few are
important (subpixels X / Y, tile size, initial pixel size).

Another thing to consider is that customers would have to do lots of testing of performance of tweaking these values and so they would discover the problems and report them to us before they become reliant on them. However, down the road they may accept version+1 of Marlin into their existing environment without fully testing and we may introduce a bug that might take them by surprise.

(Again, something to discuss and decide in the long term - not for this particular patch)...

                        ...jim

Reply via email to