Sergey, Thanks a lot for your advices, I will definitely try your approach to read the 'preprocessed' C code (as I do not like much macros).
I think I will have some time during winter holidays to implement correct gamma correction = pow(2.2), blend then pow(1/2.2) (using precomputed tables) in that C code. If you modify the Maskfill C code, could you explain me how it works as >> I would like implementing in the future the correct gamma correction in >> this software loop ? >> > > From the current source code point of view it is not an easy task to > understand how it works. The easiest way to study it is to compile the jdk > using this option in AWT2dLibraries.gmk > > --- a/make/lib/Awt2dLibraries.gmk Tue Dec 08 19:50:14 2015 +0300 > +++ b/make/lib/Awt2dLibraries.gmk Wed Dec 09 17:10:55 2015 +0300 > @@ -242,7 +242,7 @@ > EXCLUDES := $(LIBAWT_EXCLUDES), \ > EXCLUDE_FILES := $(LIBAWT_EXFILES), \ > OPTIMIZATION := LOW, \ > - CFLAGS := $(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \ > + CFLAGS := -save-temps $(CFLAGS_JDKLIB) $(LIBAWT_CFLAGS), \ > DISABLED_WARNINGS_gcc := sign-compare unused-result maybe-uninitialized \ > format-nonliteral parentheses, \ > DISABLED_WARNINGS_clang := logical-op-parentheses extern-initializer, \ > > This will save result of preprocessor. Also it will save an assembler code > which can be useful to investigate how the compiler optimize the code, > especially in case of vectorization. > > When you take a look to the code after preprocessor you will be able to > understand the DSL which is used in the AlphaMacros.h for the > "DEFINE_ALPHA_MASKBLIT" > > > There are a bunch of files in the > java.desktop/share/native/libawt/java2d/loops/. Some of them have the > general code like LoopMacros.h, AlphaMacros.h, others have implementation > for a some specific types. > > > For example take a look to the IntRgb.c > It have 2 parts: > - The array IntRgbPrimitives, which contain the list of supported > operations(it will register the functions which should be called in > MaskBlit.c for some particular types). For example it contains > REGISTER_ALPHA_MASKBLIT from/to a different types. > - Definitions of the functions like DEFINE_SRCOVER_MASKBLIT(IntArgb, > IntRgb, 4ByteArgb); This macros provide a function which will support the > maskblit IntArgb->IntRgb; > > So to understand how it work you need to trace these calls: > - MaskBlit.java -> MaskBlit(.....) > - MaskBlit.c -> *pPrim->funcs.maskblit > - The function which is generated from the DEFINE_SRCOVER_MASKBLIT for a > particular type. > > Note that if for some reason we have no specific implementation of > DEFINE_SRCOVER_MASKBLIT will meant that General MaskBlit from the > MaskBlit.java will be used and it is quite slow. > > I am on the road of investigation... > Excellent ! Maybe you should compare the preprocessor outputs between gcc 4.3.2 (JDK8) that was faster than gcc 4.8.4 (JDK9) ! I guess it is related to the loop vectorization (simd) that seems slower in gcc >= 4.4 (known regression ?) Good luck & keep me informed about your investigations, Laurent