Fully agree that using Unsafe makes one sad.
I'm just about to send out a new webrev with Alan's and Peter's
comments, once I have that done I will give using the NIO-Buffer API a
second try to see if using IntBuffer and LongBuffer is able to achieve
similar performance.
As I noted in my reply the second goal after adding this API will is to
create intrinsics that make use of the crc32c instructions available on
both x86 and SPARC which will bump the performance even further. So one
thing I try to do is make sure the implementation makes it easy to do
that without having to completely rewrite it again.
Regards,
Staffan
On 10/17/2014 11:44 AM, Stanimir Simeonoff wrote:
Actually I was under the impression I had included the list. Getting
it done w now.
Overall if you want speed you go unsafe (bit sad) as the JIT may not
remove the bound checks off the ByteBuffer. Unsafe is pretty ugly
overall, though and personally I try to avoid it giving up performance.
On 64bit machines you might be better off with java.nio.LongBuffer
instead.
Stanimir
On Fri, Oct 17, 2014 at 9:06 PM, Staffan Friberg
<staffan.frib...@oracle.com <mailto:staffan.frib...@oracle.com>> wrote:
Hi Stanimir,
Thanks for you idea, do you mind re-replying with reply all so it
goes to the list? Or I can simply reply to myself and include your
comment.
I forgot to add in my previous email that one of the follow up
items for this will be to implement intrinsics that uses available
hardware instructions, having a method that takes the buffer
address will make that much easier which is another reason to stay
with the current implementation.
Thanks,
Staffan
On 10/17/2014 10:06 AM, Stanimir Simeonoff wrote:
Hi Staffan
The actual "trick" is using ByteBuffer.asIntBuffer() while the
address is properly aligned. It's not the most intuitive approach
but you get the read stuff.
Cheers
Stanimir
On Fri, Oct 17, 2014 at 7:47 PM, Staffan Friberg
<staffan.frib...@oracle.com <mailto:staffan.frib...@oracle.com>>
wrote:
On 10/17/2014 01:46 AM, Peter Levart wrote:
On 10/17/2014 03:42 AM, Staffan Friberg wrote:
Hi,
This RFE adds a CRC-32C class. It implements Checksum
so it will have the same API CRC-32, but use a
different polynomial when calculating the CRC checksum.
CRC-32C implementation uses slicing-by-8 to achieve
high performance when calculating the CRC value.
A part from adding the new class,
java.util.zip.CRC32C, I have also added two default
methods to Checksum. These are methods that were
added to Adler32 and CRC32 in JDK 8 but before
default methods were added, which was why they were
only added to the implementors and not the interface.
Bug: https://bugs.openjdk.java.net/browse/JDK-6321472
Webrev:
http://cr.openjdk.java.net/~sfriberg/JDK-6321472/webrev.00
<http://cr.openjdk.java.net/%7Esfriberg/JDK-6321472/webrev.00>
I have started a CCC request for the changes, but was
asked to get feedback from the core libs group before
finalizing the request in case there are any API or
Javadoc changes suggested.
Thanks,
Staffan
Hi Staffan,
I can see CRC32C.reflect(int) method reverses the bits in
32 bit int value. You could use Integer.reverse(int) instead.
The CRC32C.swap32(int) method is (almost) exactly the
same as Integer.reverseBytes(int) and equivalent.
I wonder if handling ByteBuffer could be simplified. You
could leverage it's own byte order manipulation by
temporarily setting (and resetting afterwards)
ByteBuffer.order() and then use ByteBuffer.getInt() to
extract 32 bits at a time for your algorithm. This could
get you the optimal variant of algorithm for both kinds
of buffers (direct or byte[] based). Perhaps even the
byte[] based variant of algorithm could be implemented by
wrapping the array with ByteBuffer, passing it to common
private method, and relying on the escape analysis of
Hotspot to allocate the HeapByteBuffer wrapper object on
stack.
Regards, Peter
Hi Peter,
Thanks for reviewing.
I have switched to the Integer methods. Was looking through
that API but I was too stuck with the reflect and swap names
so I missed the reverse methods... :)
As Vitaly noted in his email the wrapped case runs much
slower. Going through the generated code it looks like the
getInt method actually read four bytes and then builds and
int from them, unless we have some intrinsic replacing that code.
Bits.java
static int getIntL(long a) {
return makeInt(_get(a + 3),
_get(a + 2),
_get(a + 1),
_get(a ));
}
static private int makeInt(byte b3, byte b2, byte b1,
byte b0) {
return (((b3 ) << 24) |
((b2 & 0xff) << 16) |
((b1 & 0xff) << 8) |
((b0 & 0xff) ));
}
It looks like the same holds true for DirectByteBuffers
unless you are on x86 which supports unaligned reads. So I
think aligning and using Unsafe is the best option here for
performance.
DirectByteBuffer.java
private int getInt(long a) {
if (unaligned) {
int x = unsafe.getInt(a);
return (nativeByteOrder ? x : Bits.swap(x));
}
return Bits.getInt(a, bigEndian);
}
Bits.java
static boolean unaligned() {
if (unalignedKnown)
return unaligned;
String arch = AccessController.doPrivileged(
new
sun.security.action.GetPropertyAction("os.arch"));
unaligned = arch.equals("i386") || arch.equals("x86")
|| arch.equals("amd64") || arch.equals("x86_64");
unalignedKnown = true;
return unaligned;
}
Regards,
Staffan