Clarification: "Quartz is -7% to +24% (avg. 4%) as fast as OpenGL"
should read: "Quartz is -7% to +24% (avg. 4%) faster than OpenGL" That is, on the 2009 Mini, the two drawing methods perform similarly when displaying the images generated by 3D applications (bear in mind that my application is a high-speed remote display/remote access tool.) On 10/28/16 11:36 AM, DRC wrote: > I've been able to optimize my code somewhat, such that it restricts > OpenGL drawing only to the changed regions of the framebuffer. That at > least made OpenGL Java 2D blitting usable for my application. However, > the Quartz Java 2D implementation under Java 6 is still much faster in > many cases. I have three machines I can test against: > > - a 2009 Mac Mini, Intel Core 2 Duo, nVidia GeForce 9400, Mountain Lion > - a 2011 Macbook Pro, Intel Core i5, Intel HD Graphics 3000, Mavericks > - a 2014 Mac Mini, Intel Core i7, Intel Iris Graphics, Yosemite > > On the 2009 Mini, Quartz is 2-11x (avg. 4.7x) as fast as OpenGL on the > eight 2D application workloads that I'm testing (these workloads draw a > lot of very small regions of the framebuffer.) On the twelve 3D > application workloads (which tend to mostly draw large areas of the > framebuffer), Quartz is -7% to +24% (avg. 4%) as fast as OpenGL. > > On the 2011 MB Pro, Quartz is 4-25x (avg. 10x) as fast as OpenGL on the > eight 2D application workloads and 1.5-2.3x (avg. 1.9x) as fast as > OpenGL on the twelve 3D application workloads. > > Here's the kicker, though-- it appears that the Quartz-accelerated Java > 2D blitter is disabled under Yosemite and later, so on my 2014 Mini > (which requires at least Yosemite), Java 8 (which always uses OpenGL) is > always much faster than Java 6 (which appears to use an unaccelerated > Java 2D blitter under OS X 10.10+.) I verified with virtual machines > that this phenomenon is O/S-related and not hardware-related. It seems > that Java 6 always disables Quartz blitting on Yosemite and later, > regardless of the machine. > > Unfortunately, because this Quartz-accelerated Java 2D blitter never > made it into OpenJDK, because Apple discontinued Java for OS X, and > because-- even on older hardware-- you can't use the Quartz-accelerated > blitter on newer macOS releases, our only choice now is OpenGL. That > isn't always the fastest drawing method on Macs. For instance, > comparing the two Mac Mini models with their fastest drawing method, I > observe that the 2009 Mini (Quartz, Java 6) is about twice as fast as > the 2014 Mini (OpenGL, Java 8) on the 2D application workloads. On the > 3D application workloads, the 2014 Mini (OpenGL, Java 8) is about 40% > faster than the 2009 Mini (Quartz, Java 6.) > > In short, this is still an issue. Under certain workloads, my modern > machine is performing half as fast as a 2009 machine, because of the > inability to use Quartz for blitting. > > > On 10/28/16 4:54 AM, Tobi wrote: >> Any news here Sergey? >> >> >> >> >>> Am 17.02.2015 um 15:01 schrieb Sergey Bylokhov <sergey.bylok...@oracle.com>: >>> >>> Hello, >>> Thanks for the provided info! I am able to reproduce this bug even on >>> windows: gdi vs ogl. I will take a look at it. >>> >>> On 12.02.2015 8:28, DRC wrote: >>>> On 2/10/15 7:52 AM, Sergey Bylokhov wrote: >>>>> You can run this test on jdk 8u31 and 8u40 to see a difference: >>>>> http://cr.openjdk.java.net/~serb/8029253/webrev.04/test/java/awt/image/DrawImage/UnmanagedDrawImagePerformance.java.html >>>>> >>>>> >>>>> And the test from this bug report: >>>>> https://bugs.openjdk.java.net/browse/JDK-8017247 >>>> >>>> After looking at those tests, they are definitely not related to the issue >>>> I'm seeing here. Although the TurboVNC Viewer (my application) does use >>>> bilinear interpolation if desktop scaling is enabled, that is not the >>>> "common" usage case. Normally, it's just going to be drawing a >>>> BufferedImage with no interpolation, so that at least clarifies that I >>>> shouldn't be expecting any different behavior with Java 9. The question >>>> now becomes: how to optimally take advantage of the OpenGL pipeline. As >>>> you pointed out (and I agree, based on my research) reducing the >>>> software-to-surface blits is key, although I don't have a firm grasp on >>>> how to do that. My code is basically just doing the following: >>>> >>>> public void paintComponent(Graphics g) { >>>> Graphics2D g2 = (Graphics2D) g; >>>> if (scaling enabled) { >>>> g2.setRenderingHint(RenderingHints.KEY_INTERPOLATION, >>>> RenderingHints.VALUE_INTERPOLATION_BILINEAR); >>>> g2.drawImage(im.getImage(), 0, 0, scaledWidth, scaledHeight, null); >>>> } else { >>>> g2.drawImage(im.getImage(), 0, 0, null); >>>> } >>>> g2.dispose(); >>>> } >>>> >>>> public void updateWindow() { >>>> Rect r = damage; >>>> if (!r.isEmpty()) { >>>> if (scaling enabled) { >>>> blah blah blah (adjust coordinates, mainly) >>>> paintImmediately(x, y, width, height); >>>> } else { >>>> paintImmediately(x, y, width, height); >>>> } >>>> damage.clear(); >>>> } >>>> } >>>> >>>> As VNC rectangles from the server are decoded, the "damage" rectangle gets >>>> updated to reflect the extent of the "damaged" pixels, and that extent is >>>> passed into paintImmediately(). In examining the OpenJDK source, however, >>>> it appears that glDrawPixels() is always called with the full extent of >>>> the BufferedImage, regardless of whether only a small portion of that >>>> image has actually changed. If there is something else I can do to help >>>> debug this, please let me know. I have a working JDK build. I fully >>>> admit that I may be doing something wrong or suboptimally, but bear in >>>> mind that I've spent probably over 100 hours on this, so it's not as if >>>> I'm a naive n00b here. If there's something I'm missing, then trust me >>>> that it isn't obvious! >>>> >>>> >>>>> Can you share standalone jar file of this workload? >>>> >>>> Here is everything you need to reproduce the issue: >>>> http://www.turbovnc.org/turbovnc_mac_performance_stuff.tar.gz >>>> >>>> Untar, then do >>>>> cd turbovnc_mac_performance_stuff >>>>> java -server -d64 -Dsun.java2d.trace=count -cp VncViewer.jar >>>>> com.turbovnc.vncviewer.ImageDrawTest >>>> (let it run for 20 seconds or so, then CTRL-C it.) >>>>> java -server -d64 -jar VncViewer.jar -bench compilation-16.rfb -benchiter >>>>> 3 -benchwarmup 2 >>>> (let it run to completion.) >>>> >>>> Results from Java 6u51 on my Mac Mini (2009 vintage, 2 GHz Intel Core >>>> Duo, nVidia GeForce 9400): >>>> ImageDrawTest: ~100 Mpixels/sec >>>> (all calls are to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, >>>> IntArgbPre)) >>>> compilation-16: Average 1.392763 s (Decode = 0.198173 s, Blit = 1.005974 >>>> s) >>>> >>>> Results from Java 8u31 on my Mac Mini: >>>> ImageDrawTest: ~70 Mpixels/sec >>>> (Calls are split between >>>> sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface >>>> (render-to-texture)", AnyAlpha, "OpenGL Surface") and >>>> sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntArgbPre, AnyAlpha, >>>> "OpenGL Surface")) >>>> compilation-16: Average 6.216550 s (Decode = 0.194989 s, Blit = 5.534781 >>>> s) >>>> >>>> Results from Java 8u31 on my Mac Mini without alpha-enabled image >>>> (-Dturbovnc.forcealpha=false): >>>> ImageDrawTest: ~18 Mpixels/sec >>>> (Calls are split between: >>>> sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface >>>> (render-to-texture)", AnyAlpha, "OpenGL Surface") and >>>> sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "OpenGL >>>> Surface")) >>>> compilation-16: Average 27.153480 s (Decode = 0.200333 s, Blit = >>>> 26.523137 s) >>>> >>>> So, as you can see, using an alpha-enabled image improved the performance >>>> under Java 7/8 by about 4x, both when drawing large images (ImageDrawTest) >>>> and when doing smaller image updates (compilation-16.) However, the >>>> blitting performance under Java 7/8 for small image workloads is still >>>> about 5x slower than it was under Java 6. Results from a different >>>> machine: >>>> >>>> Results from Java 6u51 on my Macbook Pro (2011 vintage, 2.4 GHz Intel >>>> Core i5, Intel HD Graphics 3000): >>>> ImageDrawTest: ~100 Mpixels/sec >>>> (all calls to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntArgbPre)) >>>> compilation-16: Average 0.592772 s (Decode = 0.113879 s, Blit = 0.351596 >>>> s) >>>> >>>> Results from Java 8u31 on my Macbook Pro: >>>> ImageDrawTest: ~66 Mpixels/sec >>>> (Calls split between >>>> sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface >>>> (render-to-texture)", AnyAlpha, "OpenGL Surface") and >>>> sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntArgbPre, AnyAlpha, >>>> "OpenGL Surface")) >>>> compilation-16: Average 6.806324 s (Decode = 0.188252 s, Blit = 6.457852 >>>> s) >>>> >>>> Results from Java 8u31 on my Macbook Pro without alpha-enabled image >>>> (-Dturbovnc.forcealpha=false): >>>> ImageDrawTest: ~50 Mpixels/sec >>>> (Calls split between >>>> sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface >>>> (render-to-texture)", AnyAlpha, "OpenGL Surface") and >>>> sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "OpenGL >>>> Surface")) >>>> compilation-16: Average 10.272508 s (Decode = 0.147805 s, Blit = >>>> 9.955666 s) >>>> >>>> Using an ARGB_PRE BufferedImage didn't help out nearly as much on this >>>> machine, and whereas the large image performance looks similar to that of >>>> the Mac Mini, the small image blitting performance still suffers by nearly >>>> a factor of 20 (although it is improved-- before the use of ARGB_PRE >>>> images, it was about a factor of 30 slower.) >>>> >>>> The architecture of this solution makes the use of VolatileImages >>>> impractical-- basically, I have to decode the VNC rectangles in real time >>>> as they arrive, so if the VolatileImage were to go away, I would have no >>>> way of rebuilding it. >>> >>> >>> -- >>> Best regards, Sergey. >>> >>