Re: RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32

David Holmes Sun, 26 May 2013 21:40:05 -0700

Hi David,

On 24/05/2013 2:06 AM, David Chase wrote:

So where do we stand on this?
Can we call it a bug and eligible for inclusion?
And are there other issues to deal with?


I think this form of the optimization is more amenable to inclusion.

I would be happy to discuss exactly what is going on in the mysterious
intrinsic-ful code, if that is an issue for anyone (and sooner is better than
later, else I'll forget the exciting details).

I'm not going to attempt to understand any of the details of this. Thatsaid I still have an issue with this code:


 147     if (buf) {

148 /* Don't know for sure how big an unsigned long is,therefore

 149              copy one at a time. */
 150           int i;
 151           for (i = 0; i < 256; i++) buf[i] = (jint) (crc_table[i]);
 152           (*env)->ReleasePrimitiveArrayCritical(env, b, buf, 0);
 153     }

buf is an array of 32-bit values; crc_table is either 32-bit or 64-bitdepending on platform. So on 64-bit we are truncating all the values incrc_table. So presumably these values all fit in 32-bits anyway?


Minor nit:

src/share/classes/java/util/zip/CRC32.java

+ import java.lang.reflect.Field;

I don't think this import is needed now.

---

test/java/util/zip/TimeChecksum.java

You added eg:

time(adler32, data, 2*iters, 16); // warmup short case, too

which is presumably to ensure both of the primary paths are hit duringthe warm up. But it isn't obvious to me that the actual test will hitthe short case ??


Thanks,
David

David

On 2013-05-21, at 5:15 PM, David Chase <[email protected]> wrote:


http://cr.openjdk.java.net/~drchase/7088419/webrev.03/

Newer, slimmer webrev.  No fork/join, the related code is removed (except the
native init routine still returns a boolean, which is currently ignored, 
indicating if
it supports the faster crc32).

What remains is:

1) no-JNI fast-path for short CRCs and Adlers
2) reformulated faster-Adler for byte-at-a-time
3) For 32/64 Linux/Solaris/MacOS x86 Sandy Bridge or better (with CLMUL and 
AVX),
     fast code for CRCs.

For now it uses assembly language, the C-with-intrinsics is included, and 
compiles
with current clang and gcc.  It tickles a bug in Solaris Studio 12.3 which has 
been reported.
Optimization for Windows is disabled because the assembler syntax is too 
different

The code has been tested by-hand across Linux (32), Solaris (32 and 64), MacOS 
(64)
and has also passed JPRT (which is to say, the failures were unrelated, hand 
examination
of the runs showed that the new CRCandAdler test was successful.  Anyone 
checking my
most recent run will notice that I am not very good at specifying tests to run, 
but there is
an earlier run.  I'm trying again, just to see if I can figure out how to make 
it behave.)

There is a companion webrev on the hotspot side that sets the secret property
that is used and reset by this code.

I hope this will be more easily regarded as a "bug fix" and less as an 
interface change.

David

On 2013-05-20, at 8:34 PM, David Holmes <[email protected]> wrote:

On 21/05/2013 3:45 AM, Alan Bateman wrote:

On 20/05/2013 14:49, David Chase wrote:

Suppose I split this bug (i.e., file a new bug) into the
Intel-acceleration part and the fork-join part.
:

This make sense.


I agree.

I also don't have an issue with eliding the optimization on Windows for the 
time being.

David

-Alan.

Re: RFR :7088419 : (L) Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32

Reply via email to