Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
------------------------------------------------------------------------
Key: PIVOT-778
URL: https://issues.apache.org/jira/browse/PIVOT-778
Project: Pivot
Issue Type: Improvement
Components: wtk
Affects Versions: 2.0
Reporter: Piotr Kołaczkowski
Priority: Minor
We are writing sort of a game, which continually calls Component.repaint
method, at 60 FPS. We noticed excessive CPU usage, although the actual amount
of painting done by our component (actually in an overriden Panel.paint) is
ridiculously small. The profiler pointed us to the paintVolatileBuffered method
in the DisplayHost. What you are doing there is:
1. obtain a new, fresh BufferedImage of size equal to the actual clip region,
let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix x
4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory
region (not in the L2 cache)
2. then you call actual paint on that buffered image (this is touching at least
5.2 MB again)
3. then you copy that to the onscreen buffer (which means copying 5.2 MB for
another time)
4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in
memory to compact young generation (= touching 5.2 MB fourth time)
The whole process means allocating from cold memory 5.2 MB per each frame and
touching about 20 MB per frame.
For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput.
It also makes the GC go crazy.
We have found that caching the buffer between the subsequent paint calls
improves performance a lot:
<code>
/** Stores the prepared offscreen buffer */
private BufferedImage bufferedImage;
/**
* Attempts to paint the display using an offscreen buffer.
*
* @param graphics
* The source graphics context.
*
* @return
* <tt>true</tt> if the display was painted using the offscreen
* buffer; <tt>false</tt>, otherwise.
*/
private boolean paintBuffered(Graphics2D graphics) {
boolean painted = false;
// Paint the display into an offscreen buffer
GraphicsConfiguration gc = graphics.getDeviceConfiguration();
java.awt.Rectangle clipBounds = graphics.getClipBounds();
if (bufferedImage == null ||
bufferedImage.getWidth() < clipBounds.width ||
bufferedImage.getHeight() < clipBounds.height)
bufferedImage = gc.createCompatibleImage(clipBounds.width,
clipBounds.height,
Transparency.OPAQUE);
if (bufferedImage != null) {
Graphics2D bufferedImageGraphics =
(Graphics2D)bufferedImage.getGraphics();
bufferedImageGraphics.setClip(0, 0, clipBounds.width,
...
</code>
Advantages:
1. it saves from costly allocation of a large object from possibly not-cached
memory region
2. after a few repaints the GC moves this object to the tenured generation, so
that the young generation collector is much more efficient (longer times
between runs)
3. the image probably stays most of the time in the L2 or L3 cache, which saves
on memory bandwidth and speeds up painting
Disadvantages:
1. uses some memory that is probably not required all the time, when the app
doesn't need to repaint anything large, however this is almost completely
shadowed by the excessive GC overhead due to continuous recreation of the
offscreen buffered image
Anyway, we observed about 2-4x performance increase by this simple change - now
when running at 60 FPS it uses only about 25% of CPU for painting, and the rest
can be used by the application logic (AI, etc.). Previously 60 FPS was probably
the most we could achieve from Core2Duo 2.2 GHz. Of course, this change won't
affect any "business applications" that don't do animations etc.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira