Phil,

I agree it is a complex issue to improve memory usage while maintaining
performance at the JDK level: applications can use java2d pisces in very
different contexts: Swing app (client with only EDT thread), server-side
application (multi thread headless) ...

For the moment, I spent a lot of my time understanding the different
classes in java2d.pisces and analyzing memory usage / performance ... using
J2DBench (all graphics tests).

In my Swing application, pisces produces a lot of waste (GC) but on server
side, the GC overhead can be more important if several threads use pisces.

Pisces uses memory differently:
- fixed arrays (dasher, stroker)
- dynamic arrays (edges ...) rowAARLE (very big one for big shapes)

For the moment I am trying to avoid memory waste (pooling or kept
reference) without any memory constraint (no eviction) but I agree it is an
important aspect for server-side applications.

To avoid concurrency issues, I use a ThreadLocal context named
RendererContext to keep few temporary arrays (float6 and a BIG rowAARLE
instance) but there is also dynamic IntArrayCache et FloatArrayCache which
have several pools divided in buckets (256, 1024, 4096, 16384, 32768)
containing only few instances.

To have best performance, I studied pisces code to clear only the used
array parts when recycling or using dirty arrays (only clear
rowAARLE[...][1]).

I think Andrea's proposal is interesting to maybe put some system
properties to give hints (low memory footprint, use cache or not ...).

2013/3/28 Phil Race <philip.r...@oracle.com>

> Maintaining a pool of objects might be an appropriate thing for an
> applications,
> but its a lot trickier for the platform as the application's usage pattern
> or intent
> is largely unknown. Weak references or soft references might be of use but
> weak references usually go away even at the next incremental GC and soft
> references tend to not go away at all until you run out of heap.
>

Agreed; for the moment, pool eviction policy is not implemented but kept in
mind.
FYI: each RendererContext (per thread) has its own array pools (not shared)
that could have different caching policies:
For instance, AWT / EDT (repaint) could use a large cache although other
threads do not use array caching at all.


> You may well be right that always doubling the array size may be too
> simplistic,
> but it would need some analysis of the code and its usage to see how much
> better we can do.


There is two part:
- initial array size for dynamic arrays: difficult to estimate but for now
set to very low capacity (8 / 50 ...) to avoid memory waste for rectangle /
line shapes. In my patch, I have defined MIN_ARRAY_SIZE = 128 (array pool)
to avoid too much resizing as I am doing array recycling.
- grow: I use x4 instead of x2 to avoid array copies.

Laurent



2013/3/28 Phil Race <philip.r...@oracle.com>

> Maintaining a pool of objects might be an appropriate thing for an
> applications,
> but its a lot trickier for the platform as the application's usage pattern
> or intent
> is largely unknown. Weak references or soft references might be of use but
> weak references usually go away even at the next incremental GC and soft
> references tend to not go away at all until you run out of heap.
>
> You may well be right that always doubling the array size may be too
> simplistic,
> but it would need some analysis of the code and its usage to see how much
> better we can do.
>
>
> >Apparently, Arrays.fill is always faster (size in 10 ... 10 000) !
> > I suspect hotspot to optimize its code and use native functions, isn't
> it ???
>
> I suppose there is some hotspot magic involved to recognise and intrinsify
> this
> method, since the source code looks like a plain old for loop.
>
> -phil.
>
>
>
> On 3/26/2013 4:00 AM, Laurent Bourgès wrote:
>
>> Dear all,
>>
>> First I joined recently the openJDK contributors, and I plan to fix
>> java2D pisces code in my spare time.
>>
>> I have a full time job on Aspro2: http://www.jmmc.fr/aspro; it is an
>> application to prepare astronomical observations at VLTI / CHARA and is
>> very used in our community (200 users): it provides scientific computations
>> (observability, model images using complex numbers ...) and zoomable plots
>> thanks to jFreeChart.
>>
>> Aspro2 is known to be very efficient (computation parallelization) and I
>> am often doing profiling using netbeans profiler or visualVM.
>>
>> To fix huge memory usages by java2d.pisces, I started implementing an
>> efficient ArrayCache (int[] and float[]) (in thread local to concurrency
>> problems):
>> - arrays in sizes between 10 and 10000 (more small arrays used than big
>> ones)
>> - resizing support (Arrays.copyOf) without wasting arrays
>> - reentrance i.e. many arrays are used at the same time (java2D Pisces
>> stroke / dash creates many segments to render)
>> - GC / Heap friendly ie support cache eviction and avoid consuming too
>> much memory
>>
>> I know object pooling is known to be not efficient with recent VM (GC is
>> better) but I think it is counter productive to create so many int[] arrays
>> in java2d.pisces and let the GC remove such wasted memory.
>>
>> Does someone have implemented such (open source) array cache (core-libs) ?
>> Opinions are welcome (but avoid "trolls").
>>
>> Moreover, sun.java2d.pisces.Helpers.**widenArray() performs a lot of
>> array resizing / copy (Arrays.copyOf) that I want to avoid mostly:
>>     // These use a hardcoded factor of 2 for increasing sizes. Perhaps
>> this
>>     // should be provided as an argument.
>>     static float[] widenArray(float[] in, final int cursize, final int
>> numToAdd) {
>>         if (in.length >= cursize + numToAdd) {
>>             return in;
>>         }
>>         return Arrays.copyOf(in, 2 * (cursize + numToAdd));
>>     }
>>
>>     static int[] widenArray(int[] in, final int cursize, final int
>> numToAdd) {
>>         if (in.length >= cursize + numToAdd) {
>>             return in;
>>         }
>>         return Arrays.copyOf(in, 2 * (cursize + numToAdd));
>>     }
>>
>> Thanks to Peter Levart, I use its microbench tool (
>> https://github.com/plevart/**micro-bench/tree/v2<https://github.com/plevart/micro-bench/tree/v2>)
>> to benchmark ArrayCache operations... and J2DBench to test java2d
>> performances
>>
>> What is the fastest way to clear an array (part) ie fill by 0:
>> - public static void fill(int[] a, int fromIndex, int toIndex, int val)
>> - public static native void arraycopy(Object src,  int  srcPos, Object
>> dest, int destPos, int length);
>> - unsafe.setMemory(array, Unsafe.ARRAY_INT_BASE_OFFSET, 512 * SIZEOF_INT,
>> (byte) 0)
>>
>> Apparently, Arrays.fill is always faster (size in 10 ... 10 000) !
>> I suspect hotspot to optimize its code and use native functions, isn't it
>> ???
>>
>> Benchmarks results:
>> >> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>> Testing arrays: int[1]...
>> #
>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      4,47 ns/op (σ =   0,00 ns/op) [
>> 4,47]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      4,40 ns/op (σ =   0,00 ns/op) [
>> 4,40]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      4,43 ns/op (σ =   0,00 ns/op) [
>> 4,43]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =      5,55 ns/op (σ =   0,16 ns/op) [
>> 5,40,      5,72]
>>
>> #
>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      6,47 ns/op (σ =   0,00 ns/op) [
>> 6,47]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      6,21 ns/op (σ =   0,00 ns/op) [
>> 6,21]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =      6,19 ns/op (σ =   0,00 ns/op) [
>> 6,19]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =      7,80 ns/op (σ =   0,10 ns/op) [
>> 7,90,      7,71]
>>
>> #
>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     26,82 ns/op (σ =   0,00 ns/op) [
>>  26,82]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     23,48 ns/op (σ =   0,00 ns/op) [
>>  23,48]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     22,42 ns/op (σ =   0,00 ns/op) [
>>  22,42]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =     28,21 ns/op (σ =   0,88 ns/op) [
>>  29,11,     27,36]
>>
>> Testing arrays: int[100]...
>> #
>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     16,49 ns/op (σ =   0,00 ns/op) [
>>  16,49]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     15,97 ns/op (σ =   0,00 ns/op) [
>>  15,97]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     16,03 ns/op (σ =   0,00 ns/op) [
>>  16,03]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =     19,32 ns/op (σ =   0,46 ns/op) [
>>  18,87,     19,80]
>>
>> #
>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     14,51 ns/op (σ =   0,00 ns/op) [
>>  14,51]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     14,17 ns/op (σ =   0,00 ns/op) [
>>  14,17]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     14,09 ns/op (σ =   0,00 ns/op) [
>>  14,09]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =     31,15 ns/op (σ =   4,04 ns/op) [
>>  27,65,     35,67]
>>
>> #
>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     52,32 ns/op (σ =   0,00 ns/op) [
>>  52,32]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     52,82 ns/op (σ =   0,00 ns/op) [
>>  52,82]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =     52,19 ns/op (σ =   0,00 ns/op) [
>>  52,19]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =     70,87 ns/op (σ =   0,71 ns/op) [
>>  70,17,     71,59]
>>
>> Testing arrays: int[10000]...
>> #
>> # ZeroFill: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  1 208,64 ns/op (σ =   0,00 ns/op) [ 1
>> 208,64]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  1 238,01 ns/op (σ =   0,00 ns/op) [ 1
>> 238,01]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  1 235,81 ns/op (σ =   0,00 ns/op) [ 1
>> 235,81]
>> runTest[class ArrayCacheBenchmark$ZeroFill] on JVM: 1.8.0-internal
>> [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =  1 325,11 ns/op (σ =   7,01 ns/op) [ 1
>> 332,16,  1 318,14]
>>
>> #
>> # FillArraySystemCopy: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  1 930,93 ns/op (σ =   0,00 ns/op) [ 1
>> 930,93]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  2 060,80 ns/op (σ =   0,00 ns/op) [ 2
>> 060,80]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  2 105,21 ns/op (σ =   0,00 ns/op) [ 2
>> 105,21]
>> runTest[class ArrayCacheBenchmark$**FillArraySystemCopy] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =  2 160,33 ns/op (σ =  13,74 ns/op) [ 2
>> 146,68,  2 174,15]
>>
>> #
>> # FillArrayUnsafe: run duration:  5 000 ms, #of logical CPUS: 4
>> #
>> # Warm up:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  3 099,50 ns/op (σ =   0,00 ns/op) [ 3
>> 099,50]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  3 041,81 ns/op (σ =   0,00 ns/op) [ 3
>> 041,81]
>> # Measure:
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            1 threads, Tavg =  3 068,34 ns/op (σ =   0,00 ns/op) [ 3
>> 068,34]
>> runTest[class ArrayCacheBenchmark$**FillArrayUnsafe] on JVM:
>> 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b22]
>>            2 threads, Tavg =  3 296,13 ns/op (σ =  34,97 ns/op) [ 3
>> 331,47,  3 261,53]
>>
>>
>> PS: java.awt.geom.Path2D has also memory allocation issues:
>>         void needRoom(boolean needMove, int newCoords) {
>>             if (needMove && numTypes == 0) {
>> throw new IllegalPathStateException("**missing initial moveto "+
>> "in path definition");
>>             }
>>             int size = pointTypes.length;
>>             if (numTypes >= size) {
>> int grow = size;
>>                 if (grow > EXPAND_MAX) {
>> grow = EXPAND_MAX;
>>                 }
>> pointTypes = Arrays.copyOf(pointTypes, size+grow);
>>             }
>>             size = floatCoords.length;
>>             if (numCoords + newCoords > size) {
>> int grow = size;
>>                 if (grow > EXPAND_MAX * 2) {
>> grow = EXPAND_MAX * 2;
>>                 }
>>                 if (grow < newCoords) {
>> grow = newCoords;
>>                 }
>> floatCoords = Arrays.copyOf(floatCoords, size+grow);
>>             }
>>         }
>>
>> Best regards,
>> Laurent
>>
>
>

Reply via email to