Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts

Philip Race Thu, 05 May 2016 16:00:25 -0700

Yes, this is ready to push.

-phil.


On 5/3/16, 12:23 AM, Jayathirth D V wrote:

Hi Prahalad,

Changes are working fine.+1.
In test case please make jtreg comments as multiline comments and move them 
below copyright and above import statements for readability before pushing.

Thanks,
Jay
-----Original Message-----
From: Jim Graham
Sent: Tuesday, May 03, 2016 1:42 AM
To: Prahalad Kumar Narayanan; [email protected]
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased 
text on translucent backgrounds gets bright artifacts

Thanks Prahalad,

Looks great - +1

                        ...jim

On 5/2/2016 8:29 AM, Prahalad Kumar Narayanan wrote:

Dear Jim&  Everyone on Java2D Community

First, thanks to Jim for every word in the feedback.

Java is used by millions of devices and users. We cannot compromise on quality.
It 's through these reviews and iterations that we ensure a consistent solution 
gets checked in.

I 've incorporated the suggestions from last review and changes are available 
in webrev.
Review Link: http://cr.openjdk.java.net/~jdv/Prahalad/8015070/webrev.06/

Quick Brief on Changes :
     1.  The macros- Declare ## DST ## SrcOverDstBlendFactor and Store ## DST 
## SrcOverDstBlendFactor are removed.
     2.  Their usage is now replaced with
                  DeclareAlphaVarFor4ByteArgb : For declaring the blend factor
                  SrcOver ## DST ## BlendFactor(dstF, dstA) : To return the 
appropriate blend factor between dstF and dstA
     3.  Most of the macro names contain image TYPE in the earlier part and 
ForSTRATEGY at the end of the name.
                  (Note: This is just an observation not a Thumb rule) Hence 
SrcOver ## DST ## BlendFactor macro name is being used.
     4.  As with every change in the native code, Internal automated build 
system was used to ensure successful compilation across all platforms.

Kindly review the changes and provide your opinion.

Thank you
Have a good day

Prahalad N.


-----Original Message-----
From: Jim Graham
Sent: Thursday, April 28, 2016 2:05 AM
To: Prahalad Kumar Narayanan; [email protected]
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased 
text on translucent backgrounds gets bright artifacts

Thanks Prahalad,

First, the macro design issues for all of these LoopMacros.h et al files
are pretty complex, so I apologize if this is a big learning step here
and if the feedback can look like nit-picking at times.  But the system
is complicated enough that we should take care to make sure the new work
remains consistent to what is already there and to keep this complicated
system maintainable in the long run.

The only issue I have with the new macros is that Store*() naming
convention is typically used when the macro itself does the assignment.
In the case of the new Store*BlendFactor() macros, they simply return
the value and the caller does the assignment, so the name is off.  This
could be fixed either by moving "blendF" to an argument to the macro, or
renaming it to one of the suggestions I gave earlier that don't imply
assignment inside the macro.

Another thing to consider is that the type of the blend factor is
actually determined by the alpha blend token "STRATEGY", not by the
image type.  Many of the alpha blend types are dependent on the image
types being used, but it is not solely determined by the image type,
sometimes it depends on the pair of src/dst image types.  In any case,
the STRATEGY used for alpha blending is not directly tied to an image
type.  Other alpha loop macros take a STRATEGY as an argument and invoke
the proper alpha blending value declaration and manipulation macros with
"For ## Strategy" macro name expansions.  But we are hard-coding
"For4ByteArgb" in this particular macro, which means we are hard-coding
the particular type of alpha math.  Since that determination is done by
hard-coding in this macro and not by delegating to the image type, then
it is inappropriate to ask the image-type-based macro to decide how to
declare the blend factor.  If anything, we should use
"DeclareAlphaVarFor4ByteArgb()" to declare the variable, not an
image-based macro.  Note that the Declare() macros for the alpha math
strategies take an argument called "VAR" so then you can assume that
they've named the variable the same token that you supplied.

My recommendation would be to delete the new Declare macros and replace
their use in LoopMacros.h with a usage of the existing
DeclareAlphaVarFor4ByteArgb() macro, and then I would move the resulting
variable into the argument list for the Store ## BlendFactor macro and
move the assignment part of the statement inside of the macro...

                        ...jim

On 4/27/16 6:37 AM, Prahalad Kumar Narayanan wrote:

Dear Jim and Everyone on Java2D Community

Good day to you.

First, Thanks to Jim for his feedback.
I 've incorporated his feedback in latest version of code and webrev link is 
shared below
         Webrev Link: 
http://cr.openjdk.java.net/~aghaisas/prahalad/8015070/webrev.05/

Quick Description on Changes :
1. Removing redundant code MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX)
         The redundant code within if and else blocks of if ( mixValSrc != 0xff 
) has been moved into block that does color blending.
         As Jim rightly said, the change would execute Multiply operation only 
if blending is required
         On the other hand, If resA reaches maximum value, the fast path is 
executed.

2. Fast path execution
         The fast path that directly sets foreground color has been moved into 
else block after checking if ( resA != MaxValFor4ByteArgb )
         Conceptually, resA would be maximum only when glyphAlpha (mixValSrc) 
and srcAlpha are maximum.

3. Isolated Declare and Store macros.
          A single macro DeclareAndInit... has been split into two macros 
Declare ## DST ## SrcOverDstBlendFactor and Store ## DST ## 
SrcOverDstBlendFactor.
          This would indeed address some compilers that do not allow 
declaration in the middle of the block.

4. Other Relevant Information:
         The changes have been verified to build on all platforms through JPRT 
auto build system.
         Java2D benchmarks were run with the changes and no degradation on 
performance was seen compared to webrev.04
         Regression was run and no new regression failures have been introduced 
with the change.

5. To reduce reviewer 's effort in review, code has been expanded with comments 
herewith.

#define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
                                   FG_PIXEL, PREFIX, SRC_PREFIX) \
     do { \
         DeclareAlphaVarFor4ByteArgb(resA) \
         DeclareCompVarsFor4ByteArgb(res) \
         jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
         if (mixValSrc) { \
             /* Evaluating srcAlpha component */ \
             if (mixValSrc != 0xff) { \
                 /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
                 PromoteByteAlphaFor4ByteArgb(mixValSrc); \
                 /* Glyph mask determines visibility of the srcColor */ \
                 resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
             } else { \
                 /* This is understood easier with floating point logic. */ \
                 /* 1.0f is max value used in floating point calculation */ \
                 /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
                 /* srcA without Multiply with glyph mask. */ \
                 resA = SRC_PREFIX ## A; \
             } \
            /* Blending srcColor and dstColor */ \
             /* This is required only when resA(i.e., srcA) is not maximum */ \
             if (resA != MaxValFor4ByteArgb) { \
                 DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
                 DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
                 DeclareCompVarsFor4ByteArgb(dst) \
                 DeclareCompVarsFor4ByteArgb(tmp) \
                 /* Redundant statement moved from previous if -else block */ \
                 MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
                 /* Based on the pixelFormat we could reduce calculations */ \
                 /* done to load the color and alpha components */ \
                 if (!(DST ## IsPremultiplied)) { \
                     Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
                                                dstA, dstR, dstG, dstB); \
                     Store4ByteArgbCompsUsingOp(tmp, =, dst); \
                 } else { \
                     Declare ## DST ## AlphaLoadData(DstPix) \
                     jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
                     DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
                                                                 pixelOffset); \
                     /* The macro's implementation loads color components  */ \
                     /* without divide by alpha adjustment as required for */ \
                     /* subsequent calculations. Note: This is used only   */ \
                     /* with preMultiplied alpha based pixel formats */ \
                     LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
                                                          DstPix, \
                                                          dst); \
                     Postload4ByteArgbFrom ## DST(pixelAddress, \
                                                  DstPix, \
                                                  tmp); \
                 } \
                 /* Avoid blending operatons if dst is fully transparent */ \
                 if (dstA) { \
                     Declare ## DST ## SrcOverDstBlendFactor(blendF) \
                     /* dstA would be 0 if either of the following is true.  */ 
\
                     /* 1. srcA is max. Parent if condition validates this.  */ 
\
                     /* 2. dstA is zero. The current if condition validates  */ 
\
                     /* Henceforth, the following Multiply need not be moved */ 
\
                     /* ahead of the if condition. This also helps to better */ 
\
                     /* performance */ \
                     dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
                     resA += dstA; \
                     /* Declares blendF variable and assigns appropriate */ \
                     /* alpha value. The definitions are contained in the*/ \
                     /* header files of respective pixel formats */ \
                     blendF = Store ## DST ## SrcOverDstBlendFactor(dstF, \
                                                                    dstA); \
                     /* This is understood easier with floating point logic.*/ \
                     /* 1.0f is max value used in floating point calculation*/ \
                     /* (1.0f * tmp) gives tmp. Hence we avoid Multiply     */ \
                     /* operation and directly add loaded color to result*/ \
                     if (blendF != MaxValFor4ByteArgb) { \
                         MultiplyAndStore4ByteArgbComps(tmp, \
                                                        blendF, \
                                                        tmp); \
                     } \
                     Store4ByteArgbCompsUsingOp(res, +=, tmp); \
                 } \
             } else { \
                 /* resA is maximum only when srcA and glyphAlpha are max   */ \
                 /* Hence the fast path to store foreground pixel color and */ \
                 /* break to the next pixel. */ \
                 Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
                                           FG_PIXEL, PREFIX); \
                 break; \
             } \
             /* In the above calculations, color values are multiplied with */ \
             /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
             /* based pixel formats. For non pre-multiplied alpha based     */ \
             /* pixel formats, the alpha is removed from color components   */ \
             /* and then stored to the resulting color */ \
             if (!(DST ## IsOpaque)&&  \
                 !(DST ## IsPremultiplied)&&  resA&&  \
                 resA<  MaxValFor4ByteArgb) \
             { \
                 DivideAndStore4ByteArgbComps(res, res, resA); \
             } \
             Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
                                                PIXEL_INDEX, res); \
         } \
     } while (0);

Kindly review the changes present in the webrev and provide your views.

Thank you once again for your effort in review
Have a great day

Prahalad N.


-----Original Message-----
From: Jim Graham
Sent: Wednesday, April 27, 2016 2:12 AM
To: Prahalad Kumar Narayanan; [email protected]
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased 
text on translucent backgrounds gets bright artifacts

Hi Prahalad,

One potential portability issue - you have a "DeclareAndInit" macro for the new 
"choose which blend factor" macro in the middle of a block.
Some C compilers allow declaring a new variable in the middle of a block, but not all.  
Since it would be hard to move the declaration to the top of the block, since it depends 
on a value computed in the first couple of lines, it might be better to change it from a 
"DeclareAndInit"
style macro into a regular declaration, and a macro to get the value, so "jint blendF;" at the top 
of the block and "blendF = SrcOverDstBlendFactorFor ## DST(...);" in the middle of the block.  (Or 
name it "SrcOver ## DST ## BlendFactor(...)?"

I just noticed something about this .04 version of the function.  At the top 
you have a test for mixValSrc != 255 and then both clauses of the if statement 
end with essentially the same code, multiplying the source by the value in 
resA.  (The version in the else clause uses SRC_PREFIX ## A, but it could just 
as easily also use resA since they are the same value at that point.)

This means you could potentially move both "MultiplyAndStore" macros until after the if 
and use resA as the multiplier.  Now, if you look, the immediate next statement tests if resA is 
maxVal.  If it is maxVal, then you don't need to do that multiply at all, so the macro could be 
moved even further to be inside the "if (resA != maxVal)" block.

At that point, you could then reinstate the "else" on that if block and move 
the Store##DST##PixelData into that else, to save you another test of resA.  This would 
essentially look like this:

if (mixValSrc != 255) {
      PromoteByteAlpha...;
      resA = MultiplyAlpha...;
} else {
      resA = SRC_PREFIX ## A;
}
if (resA != MaxVal) {
      Declare,declare,declare,declare...;
      Multiply...Comps(res, resA, SRC_PREFIX); } else {
      Store ## DST ## PixelData...;
}

It shortens the code a little, but I'm not sure if it would really help 
performance other than not having to do the test for maxVal twice.
Still, tests are fairly expensive in code like this so it could be worth a shot 
at testing, and it would simplify the code a bit in either case...

                        ...jim

On 4/15/16 12:25 AM, Prahalad Kumar Narayanan wrote:

Hello Jim&  Everyone on Java2D Community

Good day to you.

This is a review request and a follow-up to the bug-fix for the issue
Bug     : JDK-8015070
Link    : https://bugs.openjdk.java.net/browse/JDK-8015070

Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.04/

Quick Inferences on Changes in Current -Webrev.04

1 ) Subtle changes to the blend loop-
         . The subtle changes taken up, have helped to improve the performance.
         . With the current logic in webrev.04, Java2DBench reports better 
performance than Un-Optimized solution that was present in webrev.00
         . J2DBench was run for Font Styles { Plain, Bold, Italic, Bold n 
Italic } and  Font Sizes { 12, 20, 36, 72 }
         . My sincere thanks to Jim for all his detailed feedback through the 
multiple reviews that has evolved the solution today.

(Details on changes)
1.a.  Loading of Color components
         . When modelled as per SRCOVER_MASK_FILL code, the logic required few 
additional calculations to load color components.
         . The extra calculations indeed impacted performance figures.
         . This could be offset in  two possible ways
                  a. Inspect parent macro- NAME_SOLID_DRAWGLYPHLISTAA and 
advance by pixel address and not by pixel index.
                              The parent macro invokes 
GlyphListAABlendFourByteArgb through this macro- GlyphListAABlend ## 
STRATEGY(DST, pixels, x, pPix, fgpixel, solidpix, src);
                              Changing parent macro will cause spurious changes 
across GlyphListAABlend ## other pixel formats.
                              There is additional risk of breaking the stable 
and well optimized  LoopMacros.h.
                 b. Load color components based on pre-Multiplication status
                              This has been taken up and change is limited to 
the function being modified.
                              Thankfully J2DBench has still reported 
improvement in performance.

1.b.  New Macro to avoid if (DST ## IsPremultiplied) {
         . A new macro- DeclareAndInit ## DST ## SrcOverMaskBlendFactor has 
been introduced to choose between dstF, or dstA
         . The implementation is available in the header files of pixel formats 
( Eg: IntArgb.h IntArgbPre.h )
         . There are 29 different pixel formats known to Java2D, and
                  . Hence, the new macro's implementation is added only to 
those pixel formats that require the current glyph blending logic.

2 ) Testing across different formats
          . The Regression test code has been modified to test anti-aliased 
text rendering on 7 different pixel formats-
                     . IntArgb, IntArgb_Pre, FourByteAbgr, FourByteAbgr_Pre, 
IntRGB, IntBGR, 3ByteBGR.
          . As expected, the test fails without the fix on JDK 8 and JDK 7 
versions&  passes with JDK 9-internal containing the fix.

3 ) Explanation on Code Changes :
         . With multiple reviews and changes, today the code fixes the bug and 
is well optimized as well.
         . For ease of reviewer and effort in review, I 've explained the logic 
with /* comment statements */ herewith.
         . Note: These comments don't figure in the webrev.
                        As one cannot guarantee how /* comments */ within 
macros would be perceived by compiler across different platforms.

#define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
                                                                             
FG_PIXEL, PREFIX, SRC_PREFIX) \
     do { \
         DeclareAlphaVarFor4ByteArgb(resA) \
         DeclareCompVarsFor4ByteArgb(res) \
         jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
         if (mixValSrc) { \
             /* Evaluating srcColor components */ \
             if (mixValSrc != 0xff) { \
                 /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
                 PromoteByteAlphaFor4ByteArgb(mixValSrc); \
                 /* Glyph mask determines visibility of the srcColor */ \
                 resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
                 MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
             } else { \
                 /* If mixValSrc and srcA are maximum, then result color is */ \
                 /* opaque. Hence the fast path to store foreground pixel   */ \
                 /* color and return. */ \
                 if (SRC_PREFIX ## A == MaxValFor4ByteArgb) { \
                     Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
                                               FG_PIXEL, PREFIX); \
                     break; \
                 } \
                 /* This is understood easier with floating point logic. */ \
                 /* 1.0f is max value used in floating point calculation */ \
                 /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
                 /* srcA without Multiply with glyph mask. */ \
                 resA = SRC_PREFIX ## A; \
                 MultiplyAndStore4ByteArgbComps(res, \
                                                SRC_PREFIX ## A, \
                                                SRC_PREFIX); \
             } \
             /* Evaluating dstColor components */ \
             /* This is required only when resA(i.e., srcA) is not maximum */ \
             if (resA != MaxValFor4ByteArgb) { \
                 DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
                 DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
                 DeclareCompVarsFor4ByteArgb(dst) \
                 DeclareCompVarsFor4ByteArgb(tmp) \
                 /* Based on the pixelFormat we could reduce calculations */ \
                 /* done to load the color and alpha components */ \
                 if (!(DST ## IsPremultiplied)) { \
                     Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
                                                dstA, dstR, dstG, dstB); \
                     Store4ByteArgbCompsUsingOp(tmp, =, dst); \
                 } else { \
                     Declare ## DST ## AlphaLoadData(DstPix) \
                     jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
                     DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
                                                                 pixelOffset); \
                     /* The macro's implementation loads color components  */ \
                     /* without divide by alpha adjustment as required for */ \
                     /* subsequent calculations. Note: This is used only   */ \
                     /* with preMultiplied alpha based pixel formats */ \
                     LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
                                                          DstPix, \
                                                          dst); \
                     Postload4ByteArgbFrom ## DST(pixelAddress, \
                                                  DstPix, \
                                                  tmp); \
                 } \
                 /* Avoid blending operatons if dst is fully transparent */ \
                 if (dstA) { \
                     /* dstA would be 0 if either of the following is true. */ \
                     /* 1. srcA is max. Parent if condition validates this. */ \
                     /* 2. dstA is zero. The current if condition validates */ \
                     /* Henceforth, the following Multiply need not be moved*/ \
                     /* ahead of the if condition. This also helps to better*/ \
                     /* performance */ \
                     dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
                     resA += dstA; \
                     /* Declares blendF variable and assigns appropriate */ \
                     /* alpha value. The definitions are contained in the*/ \
                     /* header files of respective pixel formats */ \
                     DeclareAndInit ## DST ## SrcOverDstBlendFactor(dstF, \
                                                                    dstA, \
                                                                    blendF); \
                     /* This is understood easier with floating point logic.*/ \
                     /* 1.0f is max value used in floating point calculation*/ \
                     /* (1.0f * tmp) gives tmp. Hence we avoid 3 Multiply   */ \
                     /* operations and add the loaded color to result       */ \
                     if (blendF != MaxValFor4ByteArgb) { \
                         MultiplyAndStore4ByteArgbComps(tmp, \
                                                        blendF, \
                                                        tmp); \
                     } \
                     Store4ByteArgbCompsUsingOp(res, +=, tmp); \
                 } \
             } \
             /* In the above calculations, color values are multiplied with */ \
             /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
             /* based pixel formats. For non pre-multiplied alpha based     */ \
             /* pixel formats, the alpha is removed from color components   */ \
             /* and then stored to the resulting color.  */ \
             if (!(DST ## IsOpaque)&&  \
                 !(DST ## IsPremultiplied)&&  resA&&  \
                 resA<  MaxValFor4ByteArgb) \
             { \
                 DivideAndStore4ByteArgbComps(res, res, resA); \
             } \
             Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
                                                PIXEL_INDEX, res); \
         } \
     } while (0);

My apologies if the above code did not appear on the final webrev email.
( In few instances, the newlines don't appear in plain-text format )

Kindly review the changes present in webrev and provide your views.
If the changes are good, we could take up for the code check-in.

Thank you for your time in review
Have a good day

Prahalad N.



-----Original Message-----
From: Jim Graham
Sent: Tuesday, April 05, 2016 3:07 AM
To: Prahalad Kumar Narayanan; Sergey Bylokhov; Philip Race
Cc: Praveen Srivastava
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts

Hi Prahalad,

Can I see the webrev diffs for the "today's experiment" code with the new macro?

Also, Did you test this with opaque destinations?  The most common use
of text is on an opaque destination, so that would matter more.  I
would
suggest: INT_RGB, THREE_BYTE_BGR, INT_ARGB, INT_ARGB_PRE in that order of 
precedence of importance.  Also, test with translucent colors, though those are 
less important than opaque colors.

I'm still looking at why the non-pre would be slower than the pre.
About the only difference is the one line "if (!PRE) { DSTF = DSTA; }".
   One suggestion might be to move the test for transparent destination up a couple of 
lines, and to get rid of the extra "DSTF = dstA"
assignement with the following structure:

        dstA = Mult...();
        if (dstA) {
            resA += dstA;
            Declare...
            Postload...
            if (DST ## IsPremultiplied) {
                MultiplyAndStore(..., DSTF, ...);
            } else {
                MultiplyAndStore(..., dstA, ...);
            }
            Store...
        }

Basically, dstA == 0 is the actual test for "do we need to even try to blend the 
destination in here".  If it is zero then there is no need to add dstA to resA and 
there is no need to adjust the factor we blend by (MultiplyAndStore).  It can be 
triggered by either a transparent destination pixel or an opaque source pixel, but either 
way, dstA is the right test, not DSTF.  The second part, eliminating the DSTF=dstA 
assignment, gets rid of a line that might trip up the optimizer by simply having the 
macro expand differently for the two types.  To be even more streamlined, we could create 
a new set of macros:

// In the header files for PRE types:
#define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dF)
// In the header files for non-PRE types:
#define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dA)

Then we wouldn't need the test above for "if (DST ## Pre)", we would just 
expand the macro with:
        MultiplyAndStore(..., SRCOVER_DST_BLEND_FACTOR ## DST(DSTF, dstA), ...) 
which would hopefully confuse the optimizer even less.

If you want to really eliminate the pixel address computations, you could rewrite the 
calling loop so that it steps along a pixel pointer rather than using indexing.  Have the 
calling function/macro set up a pRas pointer to the pixel and step that along the row by 
TYPE##PixelStride as it iterates across the glyph, then hand that into the Glyph blend 
macro as DST_PTR (and eliminate PIXEL_INDEX as it would be "0" in this case)...

                        ...jim

On 4/4/16 4:37 AM, Prahalad Kumar Narayanan wrote:

Dear Jim

Good day to you.

( Just in-case, you had missed my long Friday 's email )

Quick Recap of Proceedings

1.On Friday, I had profiled two solutions that fix the bug-
JDK-8015070, using Java2D Bench

2.Profiling was done for 16 test cases with different combinations of

a.Font Style: Plain, Bold, Italic, Bold-Italic

b.Font Size: 12, 20, 36, 72

3.Result from Friday 's profiling experiments:

a.Regular Solution (Un-optimized) : This was observed to be faster
for IntArgb pixel format

b.Optimized Solution (based on SrcOver_MaskFill with fast path) :
This was observed to be faster for IntArgb_Pre pixel format

Update from Today's Experiments

1.First, I understood that new calculations introduced (pixelAddress
computation) impacted performance of optimized solution in IntArgb format.

2.Henceforth, I made the following changes, while loading destination color:

a.Check if the pixel format is PreMultiplied

b.If the format is preMultiplied, then>  take up new calculations and
use LoadAlphaFrom ## DST ## For4ByteArgb macro that does *Not* cause
divide by alpha adjustment

c.If the format is regular Argb, then>  take up loading of colors
using standard Load ## DST ## To4ByteArgb

3.Once the release build was available, Java2D Bench was re-run
(using pre-saved options file)

Result from Bench metrics:

a.In most of the test cases, the optimized solution has higher metric :
Avg characters/ second for both IntArgb and IntArgb_Pre formats

b.In 6 / total-16 test cases, optimized solution was marginally lower
than the metrics of Regular un-optimized algorithm (only for
IntArgb_Pre)

c.However, J2DAnalyzer reported that even these 6-test cases were
within 10% deviation. Hence the algorithms were categorized to be
'same' in performance.

Suggestion Required

1.The attached zip file, contains Algorithms.cpp - Which lists down
different algorithms profiled so far.

2.I 've introduced comments within the macro to explain the change.

a.Note: These comments will be removed from the final source code to
be checked in.

3.Kindly review the latest algorithm (for any issues/ bugs) and
provide your suggestions.

a.latest algorithm can be easily traced by searching for
"LoadOptimized Algorithm v3" within the file.

Thank you for your time in review&  detailed feedback that you get
every time.

Every such review improves the quality of code&  the solution

Prahalad N.

*From:* Prahalad Kumar Narayanan
*Sent:* Friday, April 01, 2016 5:07 PM
*To:* Jim Graham; Sergey Bylokhov; Philip Race
*Cc:* Praveen Srivastava
*Subject:* RE: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts

Dear Jim

Good day to you.

Thanks for your suggestions in the reviews that has evolved the fix
to the bug.

As guided by you, I measured the performance of Text Rendering with
Text Antialiasing Hint on using Java2D Bench Tool.

Solutions Profiled:

          We have two solutions - Un optimized solution and Optimized
solution modelled as per SRCOVER_MASKFILL

          ( Both the solutions were profiled on windows x86_64 with
JDK 9-internal Release build )

Test Cases Profiled:

          With Font set to : Lucida sans, different combinations of

               Font Styles : Plain, Bold, Italic, Bold Italic&&

               Font Sizes   : 12, 20, 36, 72 points were tested.

Attached herewith: JDK8015070-Profiling Data.zip

The archive contains:

          1. Algorithms.cpp     : Just to have a quick glance of all the
algorithms that we have tried so far.

          2. *.txt Files               : For each test, Java2D bench
reports the average metrics/second.

                                                         The text file
contains collection of all such average metric for nearly 16
different test cases.

          3. res Output            : .res output from each of the test runs.

Observation from J2DBench Reports

          1.  I could not get time to run the Analyzer tool across
each of these *.res files. I shall do them on Monday.

          2.  From the summary text ( average chars per. Second ) that
J2DBench reported,

                      Un-optimized solution seems to be better for
IntArgb pixel format and

                      Optimized solution is better for IntArgb_Pre
pixel format by significant margin.

          3.  I tried to improve the optimized algorithm (based on
SRCOVER_MASKFILL ) further by adding a if (dstA) { ...

                      Though there is a marginal improvement, the
optimized solution could not exceed numbers of regular algorithm (for
IntArgb pixel format)

                      This could be due to the extra calculations that
we do in-order to load color components separately.

                      However, for IntArgb_Pre pixel format, the
optimized solution is way-ahead and gives better performance.

            4. In the summary reports, you will find
CompatibleBufferedImage( Translucent ) as well.

                      In a separate experiment, I found that the pixel
format for compatible buffered image got mapped IntArgb_Pre.

                      Thus, the performance numbers match with that of
IntArgb_Pre pixel format

At the present moment, I 'm confused on the solution would be better
to fix the Bug.

Kindly share your views and thoughts

Thank you

Have a good day

Prahalad N.

-----Original Message-----
From: Jim Graham
Sent: Thursday, March 31, 2016 1:46 AM
To: Prahalad Kumar Narayanan; [email protected]
<mailto:[email protected]>; Sergey Bylokhov
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts

Hi Prahalad,

Benchmarks must run for a significant period of time to be valid.

Measuring with nanoTime() is fine, but the run times should exceed at
least a couple of seconds, not a few nanoseconds.  You might want to
use Java2DBench instead (in src/demo/share/java2d in the java.desktop
repo), which calibrates test times, does multiple runs, and includes
an analyzer to compare results from multiple test runs...

                                  ...jim

On 3/30/2016 4:27 AM, Prahalad Kumar Narayanan wrote:

Hello Jim and Everyone on Java2D Group
Good day to you.
A quick follow-up to Review Request on bug:
        Bug          : JDK-8015070
        Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
Thank you Jim for the detailed feedback.
It takes a lot of time not only in performing the review, but also in getting 
the feedback with clear words.
In each review, the solution definitely gets better&  better. I 'm
happy about it...! :)
Quick Inferences from previous feedback:
Incorporating the fast path into current logic:
         1.  I agree with you on this point and I noticed this when we modelled 
the solution as per the mask fill.
         2.  I ignored it initially for two reasons,
                       a.  The statement - if (resA != MaxValFor4ByteArgb)...  
follows srcColor pre-multiplication step and this will ensure to skip most of 
the blending calculations pertaining to destination.
                       b.  Any addition / tweaks to current logic, might 
deviate from the SRCOVER_MASKFILL reference model.
                                Many a time, managing code with similar logic 
across implementation helps a lot.
        3.   As you said, including the fast path will avoid few 
multiplications and if checks too.
              The changes are available in the current webrev.
                       Link:
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
Profiling method, and Metrics:
        1.  The profiling method that was done yesterday was mere
execution of the regression test (available in the webrev) and time
measured with System.currentTimeMillis API
                        Since only one iteration was run, the time soared into 
milli seconds. This could be due to internal glyph caching mechanism.
        2.  Today, I evolved the regression test, into a benchmark that does 
the following:
                  a.  Select Font style : {Plain, Bold, Italic, Bold
Italic}
                  b.  Select Font size : {20, 40, 60, 80}
                  c.  Trigger drawstring once before benchmark is run. This is 
to ensure, the font subsystem is done with glyph caching internally.
                  d.  Use famous string that has all characters-" The quick brown 
fox jumps over the lazy dog. 0123456789. "
                  e.  For every style-size combination - run the test
for 20 iterations and take the average. (Note: Font is fixed to
'verdana' )
                  f.  Modify the precision from System.currentTimeMillis to 
System.nanoTime() and reduce elapsedTime to micro seconds.
        3.  With the above setup in code, my observation on windows is as 
follows:
                  a.  In many cases, The optimized logic is either equal-to 
(or) better in performance than the un-optimized logic.
                               The difference is very minimal - few tens to few 
hundreds of micro-seconds.
                  b.  The optimized algorithm improves performance for 
Pre-multiplied alpha based destination buffer.
                  c.   There are occasional huge deviations where optimized 
logic seems to take longer time.
                               These could be due to external factors
like- stalls for memory, bus io etc.,
                               Since, the deviation is in micro seconds, I 
believe, it may not be a concern.
                  d.  The complete list of time measurement taken up
on windows x86_64 release build is as-follows-
                               I 'm not sure, how the data appears in the final 
webrev-email.
                               Kindly excuse, if the data is un-readable.
Platform : Windows x86_64 Release Build Algorithm : Unoptimized.
webrev.00
````````````````````````````````````````````````````````````````````
`
`
```` Executing Bench For Image Type: IntArgb
   -Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742
   -Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395
   -Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474
   -Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079
   -Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380
   -Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301
   -Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631
   -Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812
   -Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059
   -Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180
   -Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558
   -Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169
   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
426.988
   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
374.064
   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
732.375
   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
864.68
Executing Bench For Image Type: IntArgb_Pre
   -Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768
   -Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299
   -Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941
   -Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372
   -Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096
   -Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589
   -Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972
   -Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894
   -Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028
   -Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300
   -Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379
   -Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139
   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
93.243
   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
475.406
   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
280.085
   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
357.486
Platform : Windows x86_64 Release Build Algorithm : Optimized.
webrev.03
````````````````````````````````````````````````````````````````````
`
`
```` Executing Bench For Image Type: IntArgb
   -Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954
   -Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871
   -Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799
   -Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390
   -Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566
   -Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054
   -Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735
   -Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559
   -Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289
   -Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750
   -Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181
   -Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172
   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
375.336
   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
571.371
   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
548.300
   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
714.526
Executing Bench For Image Type: IntArgb_Pre
   -Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026
   -Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016
   -Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617
   -Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829
   -Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809
   -Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563
   -Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049
   -Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802
   -Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320
   -Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473
   -Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488
   -Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782
   -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
54.687
   -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
235.505
   -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
227.205
   -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
324.308
Updated webrev with changes for the fast-path :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
Kindly review and provide your suggestions.
Thank you once again for detailed review and feedback Have a good
day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Wednesday, March 30, 2016 2:46 AM
To: Prahalad Kumar Narayanan;[email protected]
<mailto:[email protected]>; Sergey Bylokhov
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
This latest version looks like it should produce correct answers.
I'd like to see benchmark results on more platforms, particularly Windows since 
the different compilers may optimize these things differently.
Also, you didn't mention what data set you used for benchmarking.
I'd
like to see benchmark results for small, medium and large font
sizes,
and possibly bold and italic fonts as well.  The reason is that the
relative ratios of "empty glyph pixels" to "partial glyph pixels" to
"fully covered glyph pixels" changes depending on the font type and
size so if you made one of those faster and another slower then the
results may be seen as a gain in one type of test if you only test
one
font type and size and it happens to match the part of the code that
is more streamlined.  Also, for small font sizes the per-glyph
overhead might hide per-pixel issues.  Please share which fonts and
sizes you used for testing and consider using some different sizes,
including something very large like 36 or 48 points (something with
large areas of opaque
pixels) as well as more normal sizes that appear in GUIs.  Also, bold fonts can 
have a higher percentage of opaque pixels.
In particular...
This latest version is missing the "fast path" from the existing code for the 
case of srcA == 255 and glyphA == 255 where it just stores the FG_PIXEL values directly.  
For large fonts/glyphs that case may be as important as detecting empty glyph pixels.
On the other hand, an additional "if" may cause the compiler to generate less efficient 
code as per Sergey's concern.  Since this "if" eliminates some multiplies and possibly 
some divides, I'd hope it would be a win-win.
You could add the fast path back inside the case where mixValSrc is 255 and just test 
srcA for 255 and do the Store ## DST ## PixelData() macro that used to be at the end of 
the block in the existing code, and then use "break;" to escape out of the 
do/while surrounding the whole macro so it moves on to the next pixel.
(Arguably, we might want to consider teaching our SRCOVER_MASKFILL
to
do the same thing.  I think that was one of the added values of
having
a separate GLYPH loop, but really both should be optimizing that
case...)
I can see now that the macro switch to use the same macro set as SRCOVER_MASKFILL required you to 
compute the pixel address, as you noted in your summary.  It makes the new macro more cumbersome 
and makes it look like it's doing a bit more work per-pixel, but in reality I think the overall 
operations end up being the same as long as the compiler optimizes the deliberate multiplications 
the same way it did for implicit multiplications in the "pRas[foo]" and 
"pRas[foo*4]" code that was being generated previously.  Benchmarks will tell us if we're 
good there...
                                             ...jim
On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote:

Hello Everyone on Java2D Group
Good day to you.
This is a follow-up to Review Request on bug:
       Bug          : JDK-8015070
       Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
First, Thanks to Jim and Sergey for their feedback on the changes so far.
Inferences from Jim 's Feedback on Loading destination colors:
       1.  The API or Macro that I had earlier used to Load DST colors, indeed, 
adjusted destination color components with divide-by-alpha if destination was 
already pre-multiplied.
                    My apologies.. I should have spotted this error at the 
first iteration itself.
       2.  Due to the divide-by-alpha adjustment, the remaining
logic would become incorrect. ( Especially, the multiplication with
dstF based on pre-mulitplication status )
       3.  Correct API is being used now, and the dstColor components are 
loaded directly without any divide-by-alpha adjustment.
Inferences from Sergey's Feedback on Performance.
       1.  To set the context for everyone, the logic present in the current 
webrev.02 is modelled as per SRCOVER_MASKFILL.
                    There are multiple if (...) conditions that remove 
un-necessary blending calculations. Ideally this should improve performance.
                    However, since some data are not readily available (as 
present in SRCOVER_MASKFILL), few additional calculations have been added.
                    They are: pre-multiplying srcColor with alpha and assigning 
to res.
                                        Finding the correct address of Pixel 
using DST_PTR and PixelStride.
                    Henceforth, as Sergey suggests, Observation on performance 
will be beneficial.
       2.  The performance of the new logic was measured with
linux-x86_64-normal-server-release config and compared with the
logic used in un-optimized code in webrev.00
       3.  Result: The newer logic provides a fractional gain (about 15 - 20 
ms) over the older logic.
Other Subtle Changes:
       1.  The test file has been renamed from
AADrawStringArtifact.java to more meaningful -
AntialiasedTextArtifact.java
       2.  The test file tests for both TYPE_INT_ARGB and TYPE_INT_ARGB_PRE 
BufferedImage formats.
                   The code has been well commented to explain the logic used 
in every function.
Kindly take your time to review the changes in the webrev link mentioned below 
and provide your suggestions.
            Webrev Link:
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/
Thank you for your time in review
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Thursday, March 24, 2016 7:57 AM
To: Prahalad Kumar Narayanan;[email protected]
<mailto:[email protected]>
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
(On a side note - ouch!  I came up with these macros in the first
place, but 20 years later I'm now realizing just how hard they are
to
navigate and review.  My apologies...)
The macro you are still using to load the destination data is one that loads it 
to non-premultiplied components, which means it will divide out the alpha if 
the destination is PRE.  The rest of the logic assumes that the components were 
loaded without any adjustment of their premultiplication so not only is that 
division an unnecessary operation, it makes the following math wrong.
The SRCOVER_MASKFILL macro seems to use "Postload ## STRATEGY ## From ## TYPE" which 
seems to load them into separate components without any adjustment of their pre-multiplication 
status.  This is paired with "LoadAlphaFrom ## TYPE ## For ## STRATEGY" to load just the 
destination alpha for computing dstF...
                                           ...jim
On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote:

Hello Everyone on Java2D Group
Good day to you.
This is a Follow-up to Review Request on the bug:
        Bug          : JDK-8015070   Anti-aliased Text on Translucent 
background gets bright artifacts
        Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
First, Sincere thanks to Jim for his valuable feedback.
        1.  As Jim correctly mentioned, SRCOVER_MASKFILL has a similar logic to 
blend two Translucent colors based on an Alpha mask.
        2.  The calculations are exactly the same as the changes in previous 
webrev.
                    However the logic of SRCOVER_MASKFILL is 'Optimized' to 
reduce the number of computations.
        3.  This optimization is definitely required because, the logic is 
executed for every single pixel in a glyph.
                    Example: If a string is made up of 5 English
characters with each character having 32 x 32 pixels,
                                       The anti-aliasing logic will be executed 
for 5 x 32 x 32 iterations.
                    Reducing computation per pixel will imply a huge benefit 
for complete drawString operation.
Observation from SRCOVER_MASKFILL
        1.  The mask fill reduces computations by through multiple if(...) 
conditions.
                      Each if condition affirms whether the next set of 
computations are required.
        2.  Optimization 1: If mask value is 0- skip entire logic.
        3.  Optimization 2: If mask value is maximum, say 255, take
srcA directly without multiplying with maskAlpha ( Reason: 1 *
srcAlpha = srcAlpha )
        4.  Optimization 3: Compute pre-multiplied resColor in two steps. First 
with srcColor and then with dstColor.
                      If the resAlpha from 1st step (i.e., srcColor) is fully 
opaque, avoid blending dstColor altogether.
Changes in Current Webrev.01
        1.  The fix for the current bug is modelled based on SRCOVER_MASKFILL.
        2.  The changes have been verified to work on windows, linux and mac 
operating systems.
        3.  The automated Test file- AADrawStringArtifact.java runs
as expected
                   Identifies artifact&  throws exception when run on JDK 7 and 
8.
                   With JDK9, the test file returns without error.
        3.  JPRT build has been run to ensure, changes build on all supported 
platforms.
                    JPRT job link :
http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.p
ra

<http://scaaa637.us.oracle.com/archive/2016/03/2016-03-22-070604.pra>

h
n
ara-linux.client
Kindly review the changes in the below mentioned link and provide
your views
        Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/
Thank you for your time in review
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Friday, March 18, 2016 6:07 AM
To: Prahalad Kumar Narayanan;[email protected]
<mailto:[email protected]>
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
This basically boils down to "alpha blending math needs to be performed in 
premultiplied form" and the existing code was not doing that.
Your changes do add a pre-multiplication step to the math in two
places
- when you mix the src alpha and the glyph alpha at the top of the macro, and 
again when you do the Multiply(dst, dstA, dst) step in the middle.  In that 
sense, the new math is correct.
However, it is not the optimal way to implement this.  In
particular, the macro used here to load the data from the
destination is the one that loads it into 4 ARGB non-premultiplied
values.  If the destination is non-PRE, then your added multiply
step is exactly what is needed, but it could be combined with the
multiply that will happen later in the blending equation, so it is
an added step rather than a fixed fraction in the existing MultMultAdd 
parameters.
Additionally, if the destination is already PRE, then the macro
being used to load the dst pixel data there will have done a
divide
step to divide out the alpha from the destination, only to have
you
reverse that math with your new
Multiply() step.  That's a lot of math to end up with a NOP.
The MUL8 you added for the srcA and glyph value is needed, that change is good. 
 Since it is common for glyph pixels to have zero alpha, you might want to test 
the glyph alpha for 0 and skip the pixel before you do the multiply, though.  
This would add one more if, but it would be a common case.
The trickier part is to load the destination components without un-premultiplying them.  
Unfortunately there is no "Load...ToArgbPre"
macro to parallel the Load macro used in the function.  Perhaps there should 
be, but you'd still end up with an extra multiply step as I mentioned above 
because you can fold the premultiplication of the dst data into the MultMultAdd 
by carefully choosing the parameters you use in the math there.
The good news is that the way that the SRCOVER_MASKFILL uses the various 
type-specific macros works around this a bit and minimizes the number of 
multiplies that happen.  You could check out DEFINE_SRCOVER_MASKFILL and see 
how it works in the case where pMask is not null (pMask is an alpha mask with 
values very similar to the glyphAA data).  Modeling this code on that code 
would correct the math and minimize it as well...
                                        ...jim
On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote:

Hello Everyone on Java2D Group
Good day to you.
Herewith, I 'm sharing the webrev for two identical Java2D Bugs.
Bug ID : JDK-8015070
            Title     : Antialiased text on translucent backgrounds gets
bright artifacts
            Link      :https://bugs.openjdk.java.net/browse/JDK-8015070
Bug ID : JDK-8013564 ( currently closed as duplicate )
            Title     : Font rendering anti-aliasing in translucent
BufferedImages broken
            Link      :https://bugs.openjdk.java.net/browse/JDK-8013564
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
Quick Summary on Bugs :
````````````````````````````````````````````````
1.  Artifacts appear on Edges of text characters when
anti-aliased
text is drawn on Translucent background
2.  The issue is reproducible on all platforms - windows, linux and mac os.
3.  Besides, the issue is reproduced with the commonly used pixel
format- 4ByteArgb.
Root Cause&  Solution :
````````````````````````````````````````````````
1.  The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h
uses
the standard blending algorithm
             dstColor = [ srcColor * glyphAlpha + dstColor * (1 -
glyphAlpha) ] / dstAlpha
2.  The above equation works only when the srcColor and dstColor are Opaque.
3.  When srcColor and dstColor are Translucent, their
alphaComponent will influence the visibility of the color, and
visibility of the color below.
4.  The new set of calculations for blending Translucent source
and
destination colors is given as
             resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha))
             resColor = [ srcColor * srcAlpha + dstColor *
dstAlpha
*
(1
-
srcAlpha) ] / resAlpha
5.  Reference text for the equation:
https://en.wikipedia.org/wiki/Alpha_compositing
6.  With the above modification to the blending logic, the
artifacts do not appear and issues are fixed.
Jtreg&  Jprt Results :
````````````````````````````````````````````````
1.  A simple test-file: AADrawStringArtifact.java has been
created
to be a part of jtreg test cases.
              The test file is well commented to explain - nature
of
artifact and how the test tries to identify them.
              As required, the test case fails with Jdk 7, Jdk 8
and
succeeds with Jdk 9-internal (built with changes for the bug fix)
2.  The changes to blending logic lies within
java.desktop/src/share/native/...
              Henceforth, JPRT was used to ensure successful build
across all supported platforms
              Jprt Job Link :
http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001.
pr

<http://scaaa637.us.oracle.com/archive/2016/03/2016-03-17-072001.pr>

a
h
n
ara-linux.client
              The build succeeds on all platforms.
Kindly review the webrev link and provide your views and suggestions.
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
If the changes look good, we could take up the changes for source checkin.
Thank you for your time in review
Have a good day
Prahalad N.

Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts

Reply via email to