Hi Laurent,

The allocations will always show up on a heap profiler, I don't know of any way 
of having them not show up if they are stack allocated, but I don't think that 
stack allocation is the issue here - small allocations come out of a fast 
generation that costs almost nothing to allocate from and nearly nothing to 
clean up.  They are actually getting allocated and GC'd, but the process is 
optimized.

The only way to tell is to benchmark and see which changes make a difference 
and which are in the noise (or, in some odd counter-intuitive cases, 
counter-productive)...

                        ...jim

On 4/9/2013 10:34 AM, Laurent Bourgès wrote:
Dear Jim,

I advocated I only looked at the netbeans memory profiler's output: no more 
megabytes allocated !

The main question is: how to know how GC / hotspot deals with such small 
allocations ? Is there any JVM flag to enable to see real allocations as does 
jmap -histo.


    Quick questions - which benchmarks were run before/after?  I see a lot of 
benchmark running in your Pisces improvement thread, but but none here.


Agreed; I can try running j2dBench on this fix only. I generally run Andrea's 
MapBench as I appeared more complex and using multiple threads.

    Also, this should be tested on multiple platforms, preferably Linux, 
Windows and Mac to see how it is affected by differences in the platform 
runtimes and threading (hopefully minimal).


It appears more difficult for me: I can use at work a mac 10.8 and I can run 
Windows XP within virtual box (but it is not very representative).

Don't you have at oracle any test platform to perform such tests / benchmark ?

    Finally, Hotspot is supposed to deal very well for small thread-local 
allocations like the int[4] and Rectangle2D that you optimized.  Was it 
necessary to cache those at all?  I'm sure the statistics for the allocations 
show up in a memory profile, but that doesn't mean it is costing us anything - 
ideally such small allocations are as fast as free and having to deal with 
caching them in a context will actually lose performance.  It may be that the 
tile caching saved enough that it might have masked unnecessary or detrimental 
changes for the smaller objects...


I repeat my question: how can I know at runtime how hotspot optimizes 
AAShapePipe code (allocations ...) ? Does hotspot can do stack allocation ? is 
it explained somewhere (allocation size threshold) ?

Maybe verbose:gc output may help ?

Finally I spent a lot of time on pisces renderer and running MapBench to show 
performance gains.

Thanks for your interesting feedback,

Laurent

On 4/5/2013 5:20 AM, Laurent Bourgčs wrote:

    Dear java2d members,

    I figured out some troubles in java2d.pipe.AAShapePipe related to both 
concurrency & memory usage:
    - concurrency issue related to static theTile field: only 1 tile is cached 
so a new byte[] is created for other threads at each call to renderTile()
    - excessive memory usage (byte[] for tile, int[] and rectangle): at each 
call to renderPath / renderTiles, several small objects are created (never 
cached) that leads to hundreds megabytes that GC must deal with

    Here are profiling screenshots:
    - 4 threads drawing on their own buffered image (MapBench test):
    http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___byte_tile.png 
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_byte_tile.png>

    - excessive int[] / Rectangle creation:
    http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___int_bbox.png 
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_int_bbox.png>
    http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___rectangle_bbox.png 
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_rectangle_bbox.png>

    Here is the proposed patch:
    http://jmmc.fr/~bourgesl/__share/AAShapePipe/webrev-1/ 
<http://jmmc.fr/~bourgesl/share/AAShapePipe/webrev-1/>

    I applied a simple solution = use a ThreadLocal or ConcurrentLinkedQueue 
(see useThreadLocal flag) to cache one AAShapePipeContext per thread (2K max).
    As its memory footprint is very small, I recommend using ThreadLocal.

    Is it necessary to use Soft/Weak reference to avoid excessive memory usage 
for such cache ?

    Is there any class dedicated to such cache (ThreadLocal with cache eviction 
or ConcurrentLinkedQueue using WeakReference ?) ?
    I think it could be very useful at the JDK level to have such feature (ie a generic 
"GC friendly"cache )

    Regards,
    Laurent


Reply via email to