Hi Ig,

Thanks for the suggestions.  Here's some more history.

Clemens Eisserer wrote:
Hi again,

Sun engineers have tried to get reasonable performance
without using JNI_Get*Critical, since that introduces other
serious performance problems.  It was my belief that any
pathological n^2 performance problems had been truly fixed.
At least the code in JDK7u23 looks like (n^2)/2 or something like
that, it copies every time the whole bytes which are left, including
malloc/free.

Successive attempts to address this performance / scalability problem have focused on minimizing the amount of data copied. As noted, the fix is in DeflaterOutputStream.write, where stride bytes at a time are deflated, *not* the entire user-provided data buffer. This results in more JNI calls (and consequently more malloc-copy-deflate-free) but does not stall GC.

Sun engineers have tried to get reasonable performance
without using JNI_Get*Critical, since that introduces other
serious performance problems.
Could please tell me when and why. As far as I understood the problem
with the *Critical*-Functions is that they hinder the JVM in doing
some operations (GC, ...) which limits scalability.

Prior to 1.5.0_u7 the *Critical* function were used, but for the sake of 6206933, their use was replaced with data copying.

If this is the only reson, using them may not be that bad if the
Get*ArrayRegion also has some GC-atomic behaviour. Copying 50mb data
atomically also blocks the GC, doesn't it?

I am working on a fix which processes the data in "strides", therefor
the lock is only held a short time. Is this really a bad idea, except
for the additional JNI overhead?

The observations I've made show that the use of strides results in 2x slower performance as compared with the *Critical*. Certainly not ideal, but certainly much better than the ~10x worse performance than early attempts at resolving the issue.

FWIW, we looked into using DirectByteBuffer but did not like the idea of keeping 2 copies of data around.

Moving the striding from DeflaterOutputStream to Deflater (and possibly providing similar functionality in the Inflater side) seems like a Good Idea. IIRC, we put the striding into DeflaterOutputStream because that has the bufer whose size is known (and optionally provided by the user when the instance is created).

Thanks,
        Dave



Thanks, lg Clemens

Reply via email to