A thorny problem. Here is a list of related bugs "fixed" in 5.0 updates:
6348045: REGRESSION: serious performance degradation as GZIPInputStream is slower http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6348045 6206933: GZipOutputStream/InputStream goes critical(calls JNI_Get*Critical) and causes slowness http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6206933 6364346: GZIPOutputStream is slower on 1.4.2_11-b02 than on 1.4.2_09 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6364346 Sun engineers have tried to get reasonable performance without using JNI_Get*Critical, since that introduces other serious performance problems. It was my belief that any pathological n^2 performance problems had been truly fixed. Make sure you are running at least 5.0u8 to get all of the above. Martin Clemens Eisserer wrote: > Hello, > > Sombody posted at > http://forums.java.net/jive/thread.jspa?messageID=251006 that he has > problems with the performance of java.util.zip.Deflater starting with > version 1.5.0_07. > I did a very dumb micro-benchmark and it seems to confirm it, with > small buffers (the original author used a 1000 byte buffer), 1.4.2 > took ~1000ms whereas 6.0/7.0b23 take 11000ms. Even when using a 32kb > buffer 1.4.2 is still twice as fast. > I played a bit with oprofile and it clearly shows up that memcopy eats > all the memory. > > The problem is that every time the whole input-buffer is copied to the > native side, assuming that every call 2000bytes (ratio 50%) of input > data are compressed "away" from the input, the method copies every > call to deflateBytes 5000k, 4998k, 4996k , .... > This can't be solved easily because we don't know how many bytes zlib > may consume from the input-data. > > I would have a few ideas how this issue could be solved: > > 1.) Using DirectByteBuffers for data-transfer. > pros: Array-Like access from the native side, no negative inpact on GC. > cons: Data has to be copied, wasted RAM (because we have two copies, > one in the byte[] supplied by the user, and one outside the heap in > the DirectByteBuffer, possible OOMs because out-of-native memory. > > 2.) Use GetPrimitiveArrayCritical: > pros: no copying involved at all, no redundant copies of data arround. > cons: quite harsh to the GC (blocked until compression is finished) - > maybe even scalability limiter. > I've modified Deflate.c to use GetPrimitiveArrayCritical, and it now > compresses in 100ms instead of 11000, even twice as fast as 1.4.2. > Although this solution looks quite cool, I doubt its behaviour does > comply with Sun's quality expectations. > > 3.) Limit the amount of byted trasfereed to the native side: > pros: no redundant copies of input-data > cons: still a lot of copying (however not n^2), maybe more JNI calls > to get same work done. > > I would be happy about suggestions and thoughts in general. Maybe > somebody knows why the old JVMs performed so much better here? > > Thank you in advance, lg Clemens > > > > Test-Case: > public class DeflaterTest > { > > public static byte[] compresserZlib(byte[] donnees) > { > ByteArrayOutputStream resultat = new ByteArrayOutputStream(); > byte[] buffer = new byte[1000]; > int nbEcrits; > > Deflater deflater = new Deflater(); > deflater.setInput(donnees); > deflater.setLevel(0); > deflater.finish(); > > while (!deflater.finished()) > { > nbEcrits = deflater.deflate(buffer); > resultat.write(buffer, 0, nbEcrits); > } > > return resultat.toByteArray(); > } > > public static void main(String[] args) > { > Random r = new Random(); > byte[] buffer = new byte[5000000]; > for(int i=0; i < buffer.length; i++) > { > buffer[i] = (byte) (r.nextInt()%127); > } > > for(int i=0; i < 100; i++) > { > long start = System.currentTimeMillis(); > byte[] result = compresserZlib(buffer); > long end = System.currentTimeMillis(); > > System.out.println("Run took: "+(end-start)+" > "+result[Math.abs(buffer[0])]); > } > > } > }
