Good to see you pop up! What are your thoughts on getting to 1.0?
Gary ---------- Forwarded message ---------- From: <[email protected]> Date: Thu, Oct 16, 2014 at 12:49 AM Subject: svn commit: r1632210 - /commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java To: [email protected] Author: damjan Date: Thu Oct 16 04:49:30 2014 New Revision: 1632210 URL: http://svn.apache.org/r1632210 Log: Format some comments better. Modified: commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java Modified: commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java URL: http://svn.apache.org/viewvc/commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java?rev=1632210&r1=1632209&r2=1632210&view=diff ============================================================================== --- commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java (original) +++ commons/proper/imaging/trunk/src/main/java/org/apache/commons/imaging/formats/jpeg/decoder/Dct.java Thu Oct 16 04:49:30 2014 @@ -22,10 +22,13 @@ final class Dct { * Here's the cost, exluding modified (de)quantization, for transforming an * 8x8 block: * - * Algorithm Adds Multiplies RightShifts Total Naive 896 1024 0 1920 - * "Symmetries" 448 224 0 672 Vetterli and 464 208 0 672 Ligtenberg Arai, - * Agui and 464 80 0 544 Nakajima (AA&N) Feig 8x8 462 54 6 522 Fused mul/add - * 416 (a pipe dream) + * Algorithm Adds Multiplies RightShifts Total + * Naive 896 1024 0 1920 + * "Symmetries" 448 224 0 672 + * Vetterli and Ligtenberg 464 208 0 672 + * Arai, Agui and Nakajima (AA&N) 464 80 0 544 + * Feig 8x8 462 54 6 522 + * Fused mul/add (a pipe dream) 416 * * IJG's libjpeg, FFmpeg, and a number of others use AA&N. * @@ -33,21 +36,25 @@ final class Dct { * are reduced from 80 in AA&N to only 54. But in practice: * * Benchmarks, Intel Core i3 @ 2.93 GHz in long mode, 4 GB RAM Time taken to - * do 100 million IDCTs (less is better): Rene' Stöckel's Feig, int: 45.07 - * seconds My Feig, floating point: 36.252 seconds AA&N, unrolled loops, - * double[][] -> double[][]: 25.167 seconds + * do 100 million IDCTs (less is better): + * Rene' Stöckel's Feig, int: 45.07 seconds + * My Feig, floating point: 36.252 seconds + * AA&N, unrolled loops, double[][] -> double[][]: 25.167 seconds * * Clearly Feig is hopeless. I suspect the performance killer is simply the * weight of the algorithm: massive number of local variables, large code * size, and lots of random array accesses. * - * Also, AA&N can be optimized a lot: AA&N, rolled loops, double[][] -> - * double[][]: 21.162 seconds AA&N, rolled loops, float[][] -> float[][]: no - * improvement, but at some stage Hotspot might start doing SIMD, so let's + * Also, AA&N can be optimized a lot: + * AA&N, rolled loops, double[][] -> double[][]: 21.162 seconds + * AA&N, rolled loops, float[][] -> float[][]: no improvement, + * but at some stage Hotspot might start doing SIMD, so let's * use float AA&N, rolled loops, float[] -> float[][]: 19.979 seconds - * apparently 2D arrays are slow! AA&N, rolled loops, inlined 1D AA&N - * transform, float[] transformed in-place: 18.5 seconds AA&N, previous - * version rewritten in C and compiled with "gcc -O3" takes: 8.5 seconds + * apparently 2D arrays are slow! + * AA&N, rolled loops, inlined 1D AA&N + * transform, float[] transformed in-place: 18.5 seconds + * AA&N, previous version rewritten in C and compiled with "gcc -O3" + * takes: 8.5 seconds * (probably due to heavy use of SIMD) * * Other brave attempts: AA&N, best float version converted to 16:16 fixed -- E-Mail: [email protected] | [email protected] Java Persistence with Hibernate, Second Edition <http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory
