Hi Laurent,
The allocations will always show up on a heap profiler, I don't know of any way
of having them not show up if they are stack allocated, but I don't think that
stack allocation is the issue here - small allocations come out of a fast
generation that costs almost nothing to allocate from and nearly nothing to
clean up. They are actually getting allocated and GC'd, but the process is
optimized.
The only way to tell is to benchmark and see which changes make a difference
and which are in the noise (or, in some odd counter-intuitive cases,
counter-productive)...
...jim
On 4/9/2013 10:34 AM, Laurent Bourgès wrote:
Dear Jim,
I advocated I only looked at the netbeans memory profiler's output: no more
megabytes allocated !
The main question is: how to know how GC / hotspot deals with such small
allocations ? Is there any JVM flag to enable to see real allocations as does
jmap -histo.
Quick questions - which benchmarks were run before/after? I see a lot of
benchmark running in your Pisces improvement thread, but but none here.
Agreed; I can try running j2dBench on this fix only. I generally run Andrea's
MapBench as I appeared more complex and using multiple threads.
Also, this should be tested on multiple platforms, preferably Linux,
Windows and Mac to see how it is affected by differences in the platform
runtimes and threading (hopefully minimal).
It appears more difficult for me: I can use at work a mac 10.8 and I can run
Windows XP within virtual box (but it is not very representative).
Don't you have at oracle any test platform to perform such tests / benchmark ?
Finally, Hotspot is supposed to deal very well for small thread-local
allocations like the int[4] and Rectangle2D that you optimized. Was it
necessary to cache those at all? I'm sure the statistics for the allocations
show up in a memory profile, but that doesn't mean it is costing us anything -
ideally such small allocations are as fast as free and having to deal with
caching them in a context will actually lose performance. It may be that the
tile caching saved enough that it might have masked unnecessary or detrimental
changes for the smaller objects...
I repeat my question: how can I know at runtime how hotspot optimizes
AAShapePipe code (allocations ...) ? Does hotspot can do stack allocation ? is
it explained somewhere (allocation size threshold) ?
Maybe verbose:gc output may help ?
Finally I spent a lot of time on pisces renderer and running MapBench to show
performance gains.
Thanks for your interesting feedback,
Laurent
On 4/5/2013 5:20 AM, Laurent Bourgčs wrote:
Dear java2d members,
I figured out some troubles in java2d.pipe.AAShapePipe related to both
concurrency & memory usage:
- concurrency issue related to static theTile field: only 1 tile is cached
so a new byte[] is created for other threads at each call to renderTile()
- excessive memory usage (byte[] for tile, int[] and rectangle): at each
call to renderPath / renderTiles, several small objects are created (never
cached) that leads to hundreds megabytes that GC must deal with
Here are profiling screenshots:
- 4 threads drawing on their own buffered image (MapBench test):
http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___byte_tile.png
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_byte_tile.png>
- excessive int[] / Rectangle creation:
http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___int_bbox.png
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_int_bbox.png>
http://jmmc.fr/~bourgesl/__share/AAShapePipe/AAShapePipe___rectangle_bbox.png
<http://jmmc.fr/~bourgesl/share/AAShapePipe/AAShapePipe_rectangle_bbox.png>
Here is the proposed patch:
http://jmmc.fr/~bourgesl/__share/AAShapePipe/webrev-1/
<http://jmmc.fr/~bourgesl/share/AAShapePipe/webrev-1/>
I applied a simple solution = use a ThreadLocal or ConcurrentLinkedQueue
(see useThreadLocal flag) to cache one AAShapePipeContext per thread (2K max).
As its memory footprint is very small, I recommend using ThreadLocal.
Is it necessary to use Soft/Weak reference to avoid excessive memory usage
for such cache ?
Is there any class dedicated to such cache (ThreadLocal with cache eviction
or ConcurrentLinkedQueue using WeakReference ?) ?
I think it could be very useful at the JDK level to have such feature (ie a generic
"GC friendly"cache )
Regards,
Laurent