Re: [OpenJDK Rasterizer] AWT & gcc 4.8 optimization options

Laurent Bourgès Fri, 15 Jan 2016 09:15:19 -0800

Sergey,

Did you made any progress ?


I finally looked at the preprocessed C code and also enabled
ftree-vectorizer-verbose output:
    CFLAGS := -save-temps -ftree-vectorize -ftree-vectorizer-verbose=2
$(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \


I looked at the IntArgbPreSrcMaskFill hotspot (in my EllipseFillTest)
according to oprofile:
samples  %        image name               symbol name
469141   30.0043  libawt.so                IntArgbPreSrcMaskFill


Here is the preprocessed C code:
- It is still complex to read as there are many do { } while (0) blocks due
to macro expansion...

void IntArgbSrcMaskFill (void *rasBase, jubyte *pMask, jint maskOff, jint
maskScan, jint width, jint height, jint fgColor, SurfaceDataRasInfo
*pRasInfo, NativePrimitive *pPrim, CompositeInfo *pCompInfo)
{
    jint srcA;
    jint srcR, srcG, srcB;
    jint rasScan = pRasInfo->scanStride;
    IntArgbDataType *pRas = (IntArgbDataType *) (rasBase);
    jint DstPix;
    do
    {
        (srcB) = (fgColor) & 0xff;
        (srcG) = ((fgColor) >> 8) & 0xff;
        (srcR) = ((fgColor) >> 16) & 0xff;
        (srcA) = ((fgColor) >> 24) & 0xff;
    }
    while (0);
    if (srcA == 0)
    {
        srcR = srcG = srcB = 0;
        fgColor = 0;
    }
    else
    {
        if (!(0))
        {
            fgColor = (srcA << 24) | (fgColor & 0x00ffffff);
            ;
        }
        if (srcA != 0xff)
        {
            do
            {
                srcR = mul8table[srcA][srcR];
                srcG = mul8table[srcA][srcG];
                srcB = mul8table[srcA][srcB];
            }
            while (0);
        }
        if (0)
        {
            ;
        }
    }
    DstPix = 0;
    ;
    rasScan -= width * 4;
    if (pMask)
    {
        pMask += maskOff;
        maskScan -= width;
        do
        {
            jint w = width;
            ;
            do
            {
                jint resA;
                jint resR, resG, resB;
                jint dstF;
                jint pathA = *pMask++;
                if (pathA > 0)
                {
                    if (pathA == 0xff)
                    {
                        (pRas)[0] = (fgColor);
                    }
                    else
                    {
                        ;
                        dstF = 0xff - pathA;
                        do
                        {
                            DstPix = (pRas)[0];
                            resA = ((juint) DstPix) >> 24;
                        }
                        while (0);
                        resA = mul8table[dstF][resA];
                        if (!(0))
                        {
                            dstF = resA;
                        }
                        resA += mul8table[pathA][srcA];
                        do
                        {
                            resR = (DstPix >> 16) & 0xff;
                            resG = (DstPix >> 8) & 0xff;
                            resB = (DstPix >> 0) & 0xff;
                        }
                        while (0);
                        do
                        {
                            resR = mul8table[dstF][resR] +
mul8table[pathA][srcR];
                            resG = mul8table[dstF][resG] +
mul8table[pathA][srcG];
                            resB = mul8table[dstF][resB] +
mul8table[pathA][srcB];
                        }
                        while (0);
                        if (!(0) && resA && resA < 0xff)
                        {
                            do
                            {
                                resR = div8table[resA][resR];
                                resG = div8table[resA][resG];
                                resB = div8table[resA][resB];
                            }
                            while (0);
                        }
                        (pRas)[0] = (((((((resA) << 8) | (resR)) << 8) |
(resG)) << 8) | (resB));
                    }
                }
                pRas = ((void *) (((intptr_t) (pRas)) + (4)));
                ;
            }
            while (--w > 0);
            pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
            ;
            pMask = ((void *) (((intptr_t) (pMask)) + (maskScan)));
        }
        while (--height > 0);
    }
    else
    {
        do
        {
            jint w = width;
            ;
            do
            {
                (pRas)[0] = (fgColor);
                pRas = ((void *) (((intptr_t) (pRas)) + (4)));
                ;
            }
            while (--w > 0);
            pRas = ((void *) (((intptr_t) (pRas)) + (rasScan)));
            ;
        }
        while (--height > 0);
    }
}

It seems that alpha blending macros are quite complex and can not be
vectorized:

Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad inner-loop form.
IntArgb.c:109: note: not vectorized: Bad inner loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: not vectorized: control flow in loop.
IntArgb.c:109: note: bad loop form.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: failed: evolution of base is not affine.
IntArgb.c:109: note: bad data references.
Analyzing loop at IntArgb.c:109
IntArgb.c:109: note: Unknown misalignment, is_packed = 0
IntArgb.c:109: note: virtual phi. skip.
IntArgb.c:109: note: not vectorized: value used after loop.
IntArgb.c:109: note: bad operation or unsupported loop bound.
IntArgb.c:109: note: vectorized 0 loops in function.
IntArgb.c:109: note: not consecutive access rasScan_26 =
pRasInfo_25(D)->scanStride;
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _40 =
mul8table[srcA_36][srcB_33];
IntArgb.c:109: note: not consecutive access _42 =
mul8table[srcA_36][srcB_31];
IntArgb.c:109: note: not consecutive access _44 =
mul8table[srcA_36][srcB_29];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *pMask_1
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Unknown alignment for access: mul8table
IntArgb.c:109: note: not consecutive access _65 =
mul8table[dstF_60][resA_64];
IntArgb.c:109: note: not consecutive access _67 =
mul8table[pathA_58][srcA_36];
IntArgb.c:109: note: not consecutive access _75 =
mul8table[dstF_66][resR_71];
IntArgb.c:109: note: not consecutive access _77 =
mul8table[pathA_58][srcB_6];
IntArgb.c:109: note: not consecutive access _80 =
mul8table[dstF_66][resG_73];
IntArgb.c:109: note: not consecutive access _82 =
mul8table[pathA_58][srcB_7];
IntArgb.c:109: note: not consecutive access _85 =
mul8table[dstF_66][resB_74];
IntArgb.c:109: note: not consecutive access _87 =
mul8table[pathA_58][srcB_8];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: Unknown alignment for access: div8table
IntArgb.c:109: note: not consecutive access _93 =
div8table[resA_69][resR_79];
IntArgb.c:109: note: not consecutive access _95 =
div8table[resA_69][resG_84];
IntArgb.c:109: note: not consecutive access _97 =
div8table[resA_69][resB_89];
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_9
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.
IntArgb.c:109: note: SLP: step doesn't divide the vector-size.
IntArgb.c:109: note: Unknown alignment for access: *rasBase_11
IntArgb.c:109: note: Failed to SLP the basic block.
IntArgb.c:109: note: not vectorized: failed to find SLP opportunities in
basic block.
IntArgb.c:109: note: not vectorized: not enough data-refs in basic block.


Any idea to make such code faster ? or to make it work with vectorization ?


Finally I noticed that the macros with Lcd suffix seems to perform proper
gamma corrections:

void IntArgbDrawGlyphListLCD(SurfaceDataRasInfo *pRasInfo, ImageRef
*glyphs, jint totalGlyphs, jint fgpixel, jint argbcolor, jint clipLeft,
jint clipTop, jint clipRight, jint clipBottom, jint rgbOrder, unsigned char
*gammaLut, unsigned char * invGammaLut, NativePrimitive *pPrim,
CompositeInfo *pCompInfo)
...
    srcR = invGammaLut[srcR];
    srcG = invGammaLut[srcG];
    srcB = invGammaLut[srcB];
...
alpha blending
...
    dstR = gammaLut[dstR];
    dstG = gammaLut[dstG];
    dstB = gammaLut[dstB];

That's exactly what I want to implement the correct gamma correction in
mask fill operations (shape draw / fill) for software loops (buffered image
rendering).

I will try now to figure out how that C code is generated by the nested
macros !

Laurent

Re: [OpenJDK Rasterizer] AWT & gcc 4.8 optimization options

Reply via email to