Hello Jim& Everyone on Java2D Community
Good day to you.
This is a review request and a follow-up to the bug-fix for the issue
Bug : JDK-8015070
Link : https://bugs.openjdk.java.net/browse/JDK-8015070
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.04/
Quick Inferences on Changes in Current -Webrev.04
1 ) Subtle changes to the blend loop-
. The subtle changes taken up, have helped to improve the performance.
. With the current logic in webrev.04, Java2DBench reports better
performance than Un-Optimized solution that was present in webrev.00
. J2DBench was run for Font Styles { Plain, Bold, Italic, Bold n
Italic } and Font Sizes { 12, 20, 36, 72 }
. My sincere thanks to Jim for all his detailed feedback through the
multiple reviews that has evolved the solution today.
(Details on changes)
1.a. Loading of Color components
. When modelled as per SRCOVER_MASK_FILL code, the logic required few
additional calculations to load color components.
. The extra calculations indeed impacted performance figures.
. This could be offset in two possible ways
a. Inspect parent macro- NAME_SOLID_DRAWGLYPHLISTAA and
advance by pixel address and not by pixel index.
The parent macro invokes
GlyphListAABlendFourByteArgb through this macro- GlyphListAABlend ##
STRATEGY(DST, pixels, x, pPix, fgpixel, solidpix, src);
Changing parent macro will cause spurious changes
across GlyphListAABlend ## other pixel formats.
There is additional risk of breaking the stable
and well optimized LoopMacros.h.
b. Load color components based on pre-Multiplication status
This has been taken up and change is limited to
the function being modified.
Thankfully J2DBench has still reported
improvement in performance.
1.b. New Macro to avoid if (DST ## IsPremultiplied) {
. A new macro- DeclareAndInit ## DST ## SrcOverMaskBlendFactor has
been introduced to choose between dstF, or dstA
. The implementation is available in the header files of pixel formats
( Eg: IntArgb.h IntArgbPre.h )
. There are 29 different pixel formats known to Java2D, and
. Hence, the new macro's implementation is added only to
those pixel formats that require the current glyph blending logic.
2 ) Testing across different formats
. The Regression test code has been modified to test anti-aliased
text rendering on 7 different pixel formats-
. IntArgb, IntArgb_Pre, FourByteAbgr, FourByteAbgr_Pre,
IntRGB, IntBGR, 3ByteBGR.
. As expected, the test fails without the fix on JDK 8 and JDK 7
versions& passes with JDK 9-internal containing the fix.
3 ) Explanation on Code Changes :
. With multiple reviews and changes, today the code fixes the bug and
is well optimized as well.
. For ease of reviewer and effort in review, I 've explained the logic
with /* comment statements */ herewith.
. Note: These comments don't figure in the webrev.
As one cannot guarantee how /* comments */ within
macros would be perceived by compiler across different platforms.
#define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
FG_PIXEL, PREFIX, SRC_PREFIX) \
do { \
DeclareAlphaVarFor4ByteArgb(resA) \
DeclareCompVarsFor4ByteArgb(res) \
jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
if (mixValSrc) { \
/* Evaluating srcColor components */ \
if (mixValSrc != 0xff) { \
/* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
PromoteByteAlphaFor4ByteArgb(mixValSrc); \
/* Glyph mask determines visibility of the srcColor */ \
resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
} else { \
/* If mixValSrc and srcA are maximum, then result color is */ \
/* opaque. Hence the fast path to store foreground pixel */ \
/* color and return. */ \
if (SRC_PREFIX ## A == MaxValFor4ByteArgb) { \
Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
FG_PIXEL, PREFIX); \
break; \
} \
/* This is understood easier with floating point logic. */ \
/* 1.0f is max value used in floating point calculation */ \
/* (1.0f * srcA) gives srcA. Hence we directly consider */ \
/* srcA without Multiply with glyph mask. */ \
resA = SRC_PREFIX ## A; \
MultiplyAndStore4ByteArgbComps(res, \
SRC_PREFIX ## A, \
SRC_PREFIX); \
} \
/* Evaluating dstColor components */ \
/* This is required only when resA(i.e., srcA) is not maximum */ \
if (resA != MaxValFor4ByteArgb) { \
DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
DeclareCompVarsFor4ByteArgb(dst) \
DeclareCompVarsFor4ByteArgb(tmp) \
/* Based on the pixelFormat we could reduce calculations */ \
/* done to load the color and alpha components */ \
if (!(DST ## IsPremultiplied)) { \
Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
dstA, dstR, dstG, dstB); \
Store4ByteArgbCompsUsingOp(tmp, =, dst); \
} else { \
Declare ## DST ## AlphaLoadData(DstPix) \
jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
pixelOffset); \
/* The macro's implementation loads color components */ \
/* without divide by alpha adjustment as required for */ \
/* subsequent calculations. Note: This is used only */ \
/* with preMultiplied alpha based pixel formats */ \
LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
DstPix, \
dst); \
Postload4ByteArgbFrom ## DST(pixelAddress, \
DstPix, \
tmp); \
} \
/* Avoid blending operatons if dst is fully transparent */ \
if (dstA) { \
/* dstA would be 0 if either of the following is true. */ \
/* 1. srcA is max. Parent if condition validates this. */ \
/* 2. dstA is zero. The current if condition validates */ \
/* Henceforth, the following Multiply need not be moved*/ \
/* ahead of the if condition. This also helps to better*/ \
/* performance */ \
dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
resA += dstA; \
/* Declares blendF variable and assigns appropriate */ \
/* alpha value. The definitions are contained in the*/ \
/* header files of respective pixel formats */ \
DeclareAndInit ## DST ## SrcOverDstBlendFactor(dstF, \
dstA, \
blendF); \
/* This is understood easier with floating point logic.*/ \
/* 1.0f is max value used in floating point calculation*/ \
/* (1.0f * tmp) gives tmp. Hence we avoid 3 Multiply */ \
/* operations and add the loaded color to result */ \
if (blendF != MaxValFor4ByteArgb) { \
MultiplyAndStore4ByteArgbComps(tmp, \
blendF, \
tmp); \
} \
Store4ByteArgbCompsUsingOp(res, +=, tmp); \
} \
} \
/* In the above calculations, color values are multiplied with */ \
/* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
/* based pixel formats. For non pre-multiplied alpha based */ \
/* pixel formats, the alpha is removed from color components */ \
/* and then stored to the resulting color. */ \
if (!(DST ## IsOpaque)&& \
!(DST ## IsPremultiplied)&& resA&& \
resA< MaxValFor4ByteArgb) \
{ \
DivideAndStore4ByteArgbComps(res, res, resA); \
} \
Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
PIXEL_INDEX, res); \
} \
} while (0);
My apologies if the above code did not appear on the final webrev email.
( In few instances, the newlines don't appear in plain-text format )
Kindly review the changes present in webrev and provide your views.
If the changes are good, we could take up for the code check-in.
Thank you for your time in review
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Tuesday, April 05, 2016 3:07 AM
To: Prahalad Kumar Narayanan; Sergey Bylokhov; Philip Race
Cc: Praveen Srivastava
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
Can I see the webrev diffs for the "today's experiment" code with the new macro?
Also, Did you test this with opaque destinations? The most common use
of text is on an opaque destination, so that would matter more. I
would
suggest: INT_RGB, THREE_BYTE_BGR, INT_ARGB, INT_ARGB_PRE in that order of
precedence of importance. Also, test with translucent colors, though those are
less important than opaque colors.
I'm still looking at why the non-pre would be slower than the pre.
About the only difference is the one line "if (!PRE) { DSTF = DSTA; }".
One suggestion might be to move the test for transparent destination up a couple of
lines, and to get rid of the extra "DSTF = dstA"
assignement with the following structure:
dstA = Mult...();
if (dstA) {
resA += dstA;
Declare...
Postload...
if (DST ## IsPremultiplied) {
MultiplyAndStore(..., DSTF, ...);
} else {
MultiplyAndStore(..., dstA, ...);
}
Store...
}
Basically, dstA == 0 is the actual test for "do we need to even try to blend the
destination in here". If it is zero then there is no need to add dstA to resA and
there is no need to adjust the factor we blend by (MultiplyAndStore). It can be
triggered by either a transparent destination pixel or an opaque source pixel, but either
way, dstA is the right test, not DSTF. The second part, eliminating the DSTF=dstA
assignment, gets rid of a line that might trip up the optimizer by simply having the
macro expand differently for the two types. To be even more streamlined, we could create
a new set of macros:
// In the header files for PRE types:
#define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA) (dF)
// In the header files for non-PRE types:
#define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA) (dA)
Then we wouldn't need the test above for "if (DST ## Pre)", we would just
expand the macro with:
MultiplyAndStore(..., SRCOVER_DST_BLEND_FACTOR ## DST(DSTF, dstA), ...)
which would hopefully confuse the optimizer even less.
If you want to really eliminate the pixel address computations, you could rewrite the
calling loop so that it steps along a pixel pointer rather than using indexing. Have the
calling function/macro set up a pRas pointer to the pixel and step that along the row by
TYPE##PixelStride as it iterates across the glyph, then hand that into the Glyph blend
macro as DST_PTR (and eliminate PIXEL_INDEX as it would be "0" in this case)...
...jim
On 4/4/16 4:37 AM, Prahalad Kumar Narayanan wrote:
Dear Jim
Good day to you.
( Just in-case, you had missed my long Friday 's email )
Quick Recap of Proceedings
1.On Friday, I had profiled two solutions that fix the bug-
JDK-8015070, using Java2D Bench
2.Profiling was done for 16 test cases with different combinations of
a.Font Style: Plain, Bold, Italic, Bold-Italic
b.Font Size: 12, 20, 36, 72
3.Result from Friday 's profiling experiments:
a.Regular Solution (Un-optimized) : This was observed to be faster
for IntArgb pixel format
b.Optimized Solution (based on SrcOver_MaskFill with fast path) :
This was observed to be faster for IntArgb_Pre pixel format
Update from Today's Experiments
1.First, I understood that new calculations introduced (pixelAddress
computation) impacted performance of optimized solution in IntArgb format.
2.Henceforth, I made the following changes, while loading destination color:
a.Check if the pixel format is PreMultiplied
b.If the format is preMultiplied, then> take up new calculations and
use LoadAlphaFrom ## DST ## For4ByteArgb macro that does *Not* cause
divide by alpha adjustment
c.If the format is regular Argb, then> take up loading of colors
using standard Load ## DST ## To4ByteArgb
3.Once the release build was available, Java2D Bench was re-run
(using pre-saved options file)
Result from Bench metrics:
a.In most of the test cases, the optimized solution has higher metric :
Avg characters/ second for both IntArgb and IntArgb_Pre formats
b.In 6 / total-16 test cases, optimized solution was marginally lower
than the metrics of Regular un-optimized algorithm (only for
IntArgb_Pre)
c.However, J2DAnalyzer reported that even these 6-test cases were
within 10% deviation. Hence the algorithms were categorized to be
'same' in performance.
Suggestion Required
1.The attached zip file, contains Algorithms.cpp - Which lists down
different algorithms profiled so far.
2.I 've introduced comments within the macro to explain the change.
a.Note: These comments will be removed from the final source code to
be checked in.
3.Kindly review the latest algorithm (for any issues/ bugs) and
provide your suggestions.
a.latest algorithm can be easily traced by searching for
"LoadOptimized Algorithm v3" within the file.
Thank you for your time in review& detailed feedback that you get
every time.
Every such review improves the quality of code& the solution
Prahalad N.
*From:* Prahalad Kumar Narayanan
*Sent:* Friday, April 01, 2016 5:07 PM
*To:* Jim Graham; Sergey Bylokhov; Philip Race
*Cc:* Praveen Srivastava
*Subject:* RE: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Dear Jim
Good day to you.
Thanks for your suggestions in the reviews that has evolved the fix
to the bug.
As guided by you, I measured the performance of Text Rendering with
Text Antialiasing Hint on using Java2D Bench Tool.
Solutions Profiled:
We have two solutions - Un optimized solution and Optimized
solution modelled as per SRCOVER_MASKFILL
( Both the solutions were profiled on windows x86_64 with
JDK 9-internal Release build )
Test Cases Profiled:
With Font set to : Lucida sans, different combinations of
Font Styles : Plain, Bold, Italic, Bold Italic&&
Font Sizes : 12, 20, 36, 72 points were tested.
Attached herewith: JDK8015070-Profiling Data.zip
The archive contains:
1. Algorithms.cpp : Just to have a quick glance of all the
algorithms that we have tried so far.
2. *.txt Files : For each test, Java2D bench
reports the average metrics/second.
The text file
contains collection of all such average metric for nearly 16
different test cases.
3. res Output : .res output from each of the test runs.
Observation from J2DBench Reports
1. I could not get time to run the Analyzer tool across
each of these *.res files. I shall do them on Monday.
2. From the summary text ( average chars per. Second ) that
J2DBench reported,
Un-optimized solution seems to be better for
IntArgb pixel format and
Optimized solution is better for IntArgb_Pre
pixel format by significant margin.
3. I tried to improve the optimized algorithm (based on
SRCOVER_MASKFILL ) further by adding a if (dstA) { ...
Though there is a marginal improvement, the
optimized solution could not exceed numbers of regular algorithm (for
IntArgb pixel format)
This could be due to the extra calculations that
we do in-order to load color components separately.
However, for IntArgb_Pre pixel format, the
optimized solution is way-ahead and gives better performance.
4. In the summary reports, you will find
CompatibleBufferedImage( Translucent ) as well.
In a separate experiment, I found that the pixel
format for compatible buffered image got mapped IntArgb_Pre.
Thus, the performance numbers match with that of
IntArgb_Pre pixel format
At the present moment, I 'm confused on the solution would be better
to fix the Bug.
Kindly share your views and thoughts
Thank you
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Thursday, March 31, 2016 1:46 AM
To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
<mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
Benchmarks must run for a significant period of time to be valid.
Measuring with nanoTime() is fine, but the run times should exceed at
least a couple of seconds, not a few nanoseconds. You might want to
use Java2DBench instead (in src/demo/share/java2d in the java.desktop
repo), which calibrates test times, does multiple runs, and includes
an analyzer to compare results from multiple test runs...
...jim
On 3/30/2016 4:27 AM, Prahalad Kumar Narayanan wrote:
Hello Jim and Everyone on Java2D Group
Good day to you.
A quick follow-up to Review Request on bug:
Bug : JDK-8015070
Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
Thank you Jim for the detailed feedback.
It takes a lot of time not only in performing the review, but also in getting
the feedback with clear words.
In each review, the solution definitely gets better& better. I 'm
happy about it...! :)
Quick Inferences from previous feedback:
Incorporating the fast path into current logic:
1. I agree with you on this point and I noticed this when we modelled
the solution as per the mask fill.
2. I ignored it initially for two reasons,
a. The statement - if (resA != MaxValFor4ByteArgb)...
follows srcColor pre-multiplication step and this will ensure to skip most of
the blending calculations pertaining to destination.
b. Any addition / tweaks to current logic, might
deviate from the SRCOVER_MASKFILL reference model.
Many a time, managing code with similar logic
across implementation helps a lot.
3. As you said, including the fast path will avoid few
multiplications and if checks too.
The changes are available in the current webrev.
Link:
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
Profiling method, and Metrics:
1. The profiling method that was done yesterday was mere
execution of the regression test (available in the webrev) and time
measured with System.currentTimeMillis API
Since only one iteration was run, the time soared into
milli seconds. This could be due to internal glyph caching mechanism.
2. Today, I evolved the regression test, into a benchmark that does
the following:
a. Select Font style : {Plain, Bold, Italic, Bold
Italic}
b. Select Font size : {20, 40, 60, 80}
c. Trigger drawstring once before benchmark is run. This is
to ensure, the font subsystem is done with glyph caching internally.
d. Use famous string that has all characters-" The quick brown
fox jumps over the lazy dog. 0123456789. "
e. For every style-size combination - run the test
for 20 iterations and take the average. (Note: Font is fixed to
'verdana' )
f. Modify the precision from System.currentTimeMillis to
System.nanoTime() and reduce elapsedTime to micro seconds.
3. With the above setup in code, my observation on windows is as
follows:
a. In many cases, The optimized logic is either equal-to
(or) better in performance than the un-optimized logic.
The difference is very minimal - few tens to few
hundreds of micro-seconds.
b. The optimized algorithm improves performance for
Pre-multiplied alpha based destination buffer.
c. There are occasional huge deviations where optimized
logic seems to take longer time.
These could be due to external factors
like- stalls for memory, bus io etc.,
Since, the deviation is in micro seconds, I
believe, it may not be a concern.
d. The complete list of time measurement taken up
on windows x86_64 release build is as-follows-
I 'm not sure, how the data appears in the final
webrev-email.
Kindly excuse, if the data is un-readable.
Platform : Windows x86_64 Release Build Algorithm : Unoptimized.
webrev.00
````````````````````````````````````````````````````````````````````
`
`
```` Executing Bench For Image Type: IntArgb
-Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742
-Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395
-Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474
-Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079
-Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380
-Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301
-Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631
-Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812
-Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059
-Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180
-Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558
-Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169
-Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
426.988
-Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
374.064
-Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
732.375
-Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
864.68
Executing Bench For Image Type: IntArgb_Pre
-Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768
-Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299
-Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941
-Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372
-Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096
-Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589
-Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972
-Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894
-Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028
-Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300
-Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379
-Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139
-Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
93.243
-Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
475.406
-Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
280.085
-Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
357.486
Platform : Windows x86_64 Release Build Algorithm : Optimized.
webrev.03
````````````````````````````````````````````````````````````````````
`
`
```` Executing Bench For Image Type: IntArgb
-Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954
-Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871
-Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799
-Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390
-Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566
-Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054
-Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735
-Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559
-Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289
-Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750
-Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181
-Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172
-Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
375.336
-Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
571.371
-Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
548.300
-Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
714.526
Executing Bench For Image Type: IntArgb_Pre
-Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026
-Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016
-Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617
-Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829
-Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809
-Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563
-Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049
-Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802
-Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320
-Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473
-Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488
-Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782
-Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
54.687
-Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
235.505
-Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
227.205
-Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
324.308
Updated webrev with changes for the fast-path :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
Kindly review and provide your suggestions.
Thank you once again for detailed review and feedback Have a good
day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Wednesday, March 30, 2016 2:46 AM
To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
<mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
This latest version looks like it should produce correct answers.
I'd like to see benchmark results on more platforms, particularly Windows since
the different compilers may optimize these things differently.
Also, you didn't mention what data set you used for benchmarking.
I'd
like to see benchmark results for small, medium and large font
sizes,
and possibly bold and italic fonts as well. The reason is that the
relative ratios of "empty glyph pixels" to "partial glyph pixels" to
"fully covered glyph pixels" changes depending on the font type and
size so if you made one of those faster and another slower then the
results may be seen as a gain in one type of test if you only test
one
font type and size and it happens to match the part of the code that
is more streamlined. Also, for small font sizes the per-glyph
overhead might hide per-pixel issues. Please share which fonts and
sizes you used for testing and consider using some different sizes,
including something very large like 36 or 48 points (something with
large areas of opaque
pixels) as well as more normal sizes that appear in GUIs. Also, bold fonts can
have a higher percentage of opaque pixels.
In particular...
This latest version is missing the "fast path" from the existing code for the
case of srcA == 255 and glyphA == 255 where it just stores the FG_PIXEL values directly.
For large fonts/glyphs that case may be as important as detecting empty glyph pixels.
On the other hand, an additional "if" may cause the compiler to generate less efficient
code as per Sergey's concern. Since this "if" eliminates some multiplies and possibly
some divides, I'd hope it would be a win-win.
You could add the fast path back inside the case where mixValSrc is 255 and just test
srcA for 255 and do the Store ## DST ## PixelData() macro that used to be at the end of
the block in the existing code, and then use "break;" to escape out of the
do/while surrounding the whole macro so it moves on to the next pixel.
(Arguably, we might want to consider teaching our SRCOVER_MASKFILL
to
do the same thing. I think that was one of the added values of
having
a separate GLYPH loop, but really both should be optimizing that
case...)
I can see now that the macro switch to use the same macro set as SRCOVER_MASKFILL required you to
compute the pixel address, as you noted in your summary. It makes the new macro more cumbersome
and makes it look like it's doing a bit more work per-pixel, but in reality I think the overall
operations end up being the same as long as the compiler optimizes the deliberate multiplications
the same way it did for implicit multiplications in the "pRas[foo]" and
"pRas[foo*4]" code that was being generated previously. Benchmarks will tell us if we're
good there...
...jim
On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote:
Hello Everyone on Java2D Group
Good day to you.
This is a follow-up to Review Request on bug:
Bug : JDK-8015070
Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
First, Thanks to Jim and Sergey for their feedback on the changes so far.
Inferences from Jim 's Feedback on Loading destination colors:
1. The API or Macro that I had earlier used to Load DST colors, indeed,
adjusted destination color components with divide-by-alpha if destination was
already pre-multiplied.
My apologies.. I should have spotted this error at the
first iteration itself.
2. Due to the divide-by-alpha adjustment, the remaining
logic would become incorrect. ( Especially, the multiplication with
dstF based on pre-mulitplication status )
3. Correct API is being used now, and the dstColor components are
loaded directly without any divide-by-alpha adjustment.
Inferences from Sergey's Feedback on Performance.
1. To set the context for everyone, the logic present in the current
webrev.02 is modelled as per SRCOVER_MASKFILL.
There are multiple if (...) conditions that remove
un-necessary blending calculations. Ideally this should improve performance.
However, since some data are not readily available (as
present in SRCOVER_MASKFILL), few additional calculations have been added.
They are: pre-multiplying srcColor with alpha and assigning
to res.
Finding the correct address of Pixel
using DST_PTR and PixelStride.
Henceforth, as Sergey suggests, Observation on performance
will be beneficial.
2. The performance of the new logic was measured with
linux-x86_64-normal-server-release config and compared with the
logic used in un-optimized code in webrev.00
3. Result: The newer logic provides a fractional gain (about 15 - 20
ms) over the older logic.
Other Subtle Changes:
1. The test file has been renamed from
AADrawStringArtifact.java to more meaningful -
AntialiasedTextArtifact.java
2. The test file tests for both TYPE_INT_ARGB and TYPE_INT_ARGB_PRE
BufferedImage formats.
The code has been well commented to explain the logic used
in every function.
Kindly take your time to review the changes in the webrev link mentioned below
and provide your suggestions.
Webrev Link:
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/
Thank you for your time in review
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Thursday, March 24, 2016 7:57 AM
To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
<mailto:2d-dev@openjdk.java.net>
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
(On a side note - ouch! I came up with these macros in the first
place, but 20 years later I'm now realizing just how hard they are
to
navigate and review. My apologies...)
The macro you are still using to load the destination data is one that loads it
to non-premultiplied components, which means it will divide out the alpha if
the destination is PRE. The rest of the logic assumes that the components were
loaded without any adjustment of their premultiplication so not only is that
division an unnecessary operation, it makes the following math wrong.
The SRCOVER_MASKFILL macro seems to use "Postload ## STRATEGY ## From ## TYPE" which
seems to load them into separate components without any adjustment of their pre-multiplication
status. This is paired with "LoadAlphaFrom ## TYPE ## For ## STRATEGY" to load just the
destination alpha for computing dstF...
...jim
On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote:
Hello Everyone on Java2D Group
Good day to you.
This is a Follow-up to Review Request on the bug:
Bug : JDK-8015070 Anti-aliased Text on Translucent
background gets bright artifacts
Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
First, Sincere thanks to Jim for his valuable feedback.
1. As Jim correctly mentioned, SRCOVER_MASKFILL has a similar logic to
blend two Translucent colors based on an Alpha mask.
2. The calculations are exactly the same as the changes in previous
webrev.
However the logic of SRCOVER_MASKFILL is 'Optimized' to
reduce the number of computations.
3. This optimization is definitely required because, the logic is
executed for every single pixel in a glyph.
Example: If a string is made up of 5 English
characters with each character having 32 x 32 pixels,
The anti-aliasing logic will be executed
for 5 x 32 x 32 iterations.
Reducing computation per pixel will imply a huge benefit
for complete drawString operation.
Observation from SRCOVER_MASKFILL
1. The mask fill reduces computations by through multiple if(...)
conditions.
Each if condition affirms whether the next set of
computations are required.
2. Optimization 1: If mask value is 0- skip entire logic.
3. Optimization 2: If mask value is maximum, say 255, take
srcA directly without multiplying with maskAlpha ( Reason: 1 *
srcAlpha = srcAlpha )
4. Optimization 3: Compute pre-multiplied resColor in two steps. First
with srcColor and then with dstColor.
If the resAlpha from 1st step (i.e., srcColor) is fully
opaque, avoid blending dstColor altogether.
Changes in Current Webrev.01
1. The fix for the current bug is modelled based on SRCOVER_MASKFILL.
2. The changes have been verified to work on windows, linux and mac
operating systems.
3. The automated Test file- AADrawStringArtifact.java runs
as expected
Identifies artifact& throws exception when run on JDK 7 and
8.
With JDK9, the test file returns without error.
3. JPRT build has been run to ensure, changes build on all supported
platforms.
JPRT job link :
http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.p
ra
<http://scaaa637.us.oracle.com/archive/2016/03/2016-03-22-070604.pra>
h
n
ara-linux.client
Kindly review the changes in the below mentioned link and provide
your views
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/
Thank you for your time in review
Have a good day
Prahalad N.
-----Original Message-----
From: Jim Graham
Sent: Friday, March 18, 2016 6:07 AM
To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
<mailto:2d-dev@openjdk.java.net>
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
Antialiased text on translucent backgrounds gets bright artifacts
Hi Prahalad,
This basically boils down to "alpha blending math needs to be performed in
premultiplied form" and the existing code was not doing that.
Your changes do add a pre-multiplication step to the math in two
places
- when you mix the src alpha and the glyph alpha at the top of the macro, and
again when you do the Multiply(dst, dstA, dst) step in the middle. In that
sense, the new math is correct.
However, it is not the optimal way to implement this. In
particular, the macro used here to load the data from the
destination is the one that loads it into 4 ARGB non-premultiplied
values. If the destination is non-PRE, then your added multiply
step is exactly what is needed, but it could be combined with the
multiply that will happen later in the blending equation, so it is
an added step rather than a fixed fraction in the existing MultMultAdd
parameters.
Additionally, if the destination is already PRE, then the macro
being used to load the dst pixel data there will have done a
divide
step to divide out the alpha from the destination, only to have
you
reverse that math with your new
Multiply() step. That's a lot of math to end up with a NOP.
The MUL8 you added for the srcA and glyph value is needed, that change is good.
Since it is common for glyph pixels to have zero alpha, you might want to test
the glyph alpha for 0 and skip the pixel before you do the multiply, though.
This would add one more if, but it would be a common case.
The trickier part is to load the destination components without un-premultiplying them.
Unfortunately there is no "Load...ToArgbPre"
macro to parallel the Load macro used in the function. Perhaps there should
be, but you'd still end up with an extra multiply step as I mentioned above
because you can fold the premultiplication of the dst data into the MultMultAdd
by carefully choosing the parameters you use in the math there.
The good news is that the way that the SRCOVER_MASKFILL uses the various
type-specific macros works around this a bit and minimizes the number of
multiplies that happen. You could check out DEFINE_SRCOVER_MASKFILL and see
how it works in the case where pMask is not null (pMask is an alpha mask with
values very similar to the glyphAA data). Modeling this code on that code
would correct the math and minimize it as well...
...jim
On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote:
Hello Everyone on Java2D Group
Good day to you.
Herewith, I 'm sharing the webrev for two identical Java2D Bugs.
Bug ID : JDK-8015070
Title : Antialiased text on translucent backgrounds gets
bright artifacts
Link :https://bugs.openjdk.java.net/browse/JDK-8015070
Bug ID : JDK-8013564 ( currently closed as duplicate )
Title : Font rendering anti-aliasing in translucent
BufferedImages broken
Link :https://bugs.openjdk.java.net/browse/JDK-8013564
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
Quick Summary on Bugs :
````````````````````````````````````````````````
1. Artifacts appear on Edges of text characters when
anti-aliased
text is drawn on Translucent background
2. The issue is reproducible on all platforms - windows, linux and mac os.
3. Besides, the issue is reproduced with the commonly used pixel
format- 4ByteArgb.
Root Cause& Solution :
````````````````````````````````````````````````
1. The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h
uses
the standard blending algorithm
dstColor = [ srcColor * glyphAlpha + dstColor * (1 -
glyphAlpha) ] / dstAlpha
2. The above equation works only when the srcColor and dstColor are Opaque.
3. When srcColor and dstColor are Translucent, their
alphaComponent will influence the visibility of the color, and
visibility of the color below.
4. The new set of calculations for blending Translucent source
and
destination colors is given as
resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha))
resColor = [ srcColor * srcAlpha + dstColor *
dstAlpha
*
(1
-
srcAlpha) ] / resAlpha
5. Reference text for the equation:
https://en.wikipedia.org/wiki/Alpha_compositing
6. With the above modification to the blending logic, the
artifacts do not appear and issues are fixed.
Jtreg& Jprt Results :
````````````````````````````````````````````````
1. A simple test-file: AADrawStringArtifact.java has been
created
to be a part of jtreg test cases.
The test file is well commented to explain - nature
of
artifact and how the test tries to identify them.
As required, the test case fails with Jdk 7, Jdk 8
and
succeeds with Jdk 9-internal (built with changes for the bug fix)
2. The changes to blending logic lies within
java.desktop/src/share/native/...
Henceforth, JPRT was used to ensure successful build
across all supported platforms
Jprt Job Link :
http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001.
pr
<http://scaaa637.us.oracle.com/archive/2016/03/2016-03-17-072001.pr>
a
h
n
ara-linux.client
The build succeeds on all platforms.
Kindly review the webrev link and provide your views and suggestions.
Webrev Link :
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
If the changes look good, we could take up the changes for source checkin.
Thank you for your time in review
Have a good day
Prahalad N.