Re: Performance regression in java.util.zip.Deflater

Martin Buchholz Thu, 20 Dec 2007 13:25:39 -0800

A thorny problem.
Here is a list of related bugs "fixed" in 5.0 updates:


6348045: REGRESSION: serious performance degradation as GZIPInputStream
is slower
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6348045

6206933: GZipOutputStream/InputStream goes critical(calls
JNI_Get*Critical) and causes slowness
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6206933

6364346: GZIPOutputStream is slower on 1.4.2_11-b02 than on 1.4.2_09
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6364346

Sun engineers have tried to get reasonable performance
without using JNI_Get*Critical, since that introduces other
serious performance problems.  It was my belief that any
pathological n^2 performance problems had been truly fixed.

Make sure you are running at least 5.0u8 to get all of the above.

Martin

Clemens Eisserer wrote:
> Hello,
> 
> Sombody posted at
> http://forums.java.net/jive/thread.jspa?messageID=251006 that he has
> problems with the performance of java.util.zip.Deflater starting with
> version 1.5.0_07.
> I did a very dumb micro-benchmark and it seems to confirm it, with
> small buffers (the original author used a 1000 byte buffer), 1.4.2
> took ~1000ms whereas 6.0/7.0b23 take 11000ms. Even when using a 32kb
> buffer 1.4.2 is still twice as fast.
> I played a bit with oprofile and it clearly shows up that memcopy eats
> all the memory.
> 
> The problem is that every time the whole input-buffer is copied to the
> native side, assuming that every call 2000bytes (ratio 50%) of input
> data are compressed "away" from the input, the method copies every
> call to deflateBytes 5000k, 4998k, 4996k , ....
> This can't be solved easily because we don't know how many bytes zlib
> may consume from the input-data.
> 
> I would have a few ideas how this issue could be solved:
> 
> 1.) Using DirectByteBuffers for data-transfer.
> pros: Array-Like access from the native side, no negative inpact on GC.
> cons: Data has to be copied, wasted RAM (because we have two copies,
> one in the byte[] supplied by the user, and one outside the heap in
> the DirectByteBuffer, possible OOMs because out-of-native memory.
> 
> 2.) Use GetPrimitiveArrayCritical:
> pros: no copying involved at all, no redundant copies of data arround.
> cons: quite harsh to the GC (blocked until compression is finished) -
> maybe even scalability limiter.
> I've modified Deflate.c to use GetPrimitiveArrayCritical, and it now
> compresses in 100ms instead of 11000, even twice as fast as 1.4.2.
> Although this solution looks quite cool, I doubt its behaviour does
> comply with Sun's quality expectations.
> 
> 3.) Limit the amount of byted trasfereed to the native side:
> pros: no redundant copies of input-data
> cons: still a lot of copying (however not n^2), maybe more JNI calls
> to get same work done.
> 
> I would be happy about suggestions and thoughts in general. Maybe
> somebody knows why the old JVMs performed so much better here?
> 
> Thank you in advance, lg Clemens
> 
> 
> 
> Test-Case:
> public class DeflaterTest
> {
> 
> public static byte[] compresserZlib(byte[] donnees)
>       {
>       ByteArrayOutputStream resultat = new ByteArrayOutputStream();
>       byte[] buffer = new byte[1000];
>       int nbEcrits;
> 
>       Deflater deflater = new Deflater();
>       deflater.setInput(donnees);
>       deflater.setLevel(0);
>       deflater.finish();
> 
>       while (!deflater.finished())
>       {
>               nbEcrits = deflater.deflate(buffer);
>               resultat.write(buffer, 0, nbEcrits);
>       }
> 
>               return resultat.toByteArray();
>       }
>       
>       public static void main(String[] args)
>       {
>               Random r = new Random();
>               byte[] buffer = new byte[5000000];
>               for(int i=0; i < buffer.length; i++)
>               {
>                       buffer[i] = (byte) (r.nextInt()%127);
>               }
>               
>               for(int i=0; i < 100; i++)
>               {
>                       long start = System.currentTimeMillis();
>                       byte[] result = compresserZlib(buffer);
>                       long end = System.currentTimeMillis();
>                       
>                       System.out.println("Run took: "+(end-start)+" 
> "+result[Math.abs(buffer[0])]);
>               }
>               
>       }
> }

Re: Performance regression in java.util.zip.Deflater

Reply via email to