On Wed, 23 Jun 2021 00:31:55 GMT, Scott Gibbons 
<github.com+6704669+asgibb...@openjdk.org> wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 20000 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 50000 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 20000 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 50000 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   20000 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   50000 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fixing Windows build warnings

I hit strange failure in compiler/intrinsics/base64/TestBase64.java test on 
Windows machine which have Intel 8167M cpu (AVX512).

#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ff92bcbd99e, pid=24628, 
tid=6804
#
# Problematic frame:
# V  [jvm.dll+0xabd99e]  ObjectMonitor::object_peek+0xe
#

Current thread (0x0000016c923de2c0):  JavaThread "MainThread" [_thread_in_Java, 
id=6804, stack(0x00000060df600000,0x00000060df700000)]

Stack: [0x00000060df600000,0x00000060df700000],  sp=0x00000060df6fcb50,  free 
space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xabd99e]  ObjectMonitor::object_peek+0xe  (objectMonitor.cpp:304)
V  [jvm.dll+0xc48d5b]  ObjectSynchronizer::quick_enter+0x9b  
(synchronizer.cpp:331)
V  [jvm.dll+0xb9b6f6]  SharedRuntime::monitor_enter_helper+0x36  
(sharedRuntime.cpp:2112)
V  [jvm.dll+0x389894]  Runtime1::monitorenter+0x94  (c1_Runtime1.cpp:748)
C  0x0000016c99c4a757

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::monitorenter_nofpu Runtime1 stub
J 40 c1 
java.util.concurrent.ConcurrentHashMap.putVal(Ljava/lang/Object;Ljava/lang/Object;Z)Ljava/lang/Object;
 java.base@18-internal (432 bytes) @ 0x0000016c9a1801f8 
[0x0000016c9a17e6a0+0x0000000000001b58]
J 43 c1 
java.util.concurrent.ConcurrentHashMap.putIfAbsent(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
 java.base@18-internal (8 bytes) @ 0x0000016c9a181c34 
[0x0000016c9a181bc0+0x0000000000000074]
j  
java.lang.ClassLoader.getClassLoadingLock(Ljava/lang/String;)Ljava/lang/Object;+23
 java.base@18-internal
j  
jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(Ljava/lang/String;Z)Ljava/lang/Class;+2
 java.base@18-internal
j  
jdk.internal.loader.BuiltinClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+3
 java.base@18-internal
j  
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class;+36
 java.base@18-internal
j  java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;+3 
java.base@18-internal
v  ~StubRoutines::call_stub
j  
compiler.intrinsics.base64.TestBase64.test0(Lcompiler/intrinsics/base64/TestBase64$FileType;Lcompiler/intrinsics/base64/TestBase64$Base64Type;Ljava/util/Base64$Encoder;Ljava/util/Base64$Decoder;Ljava/lang/String;Ljava/lang/String;I)V+25
j  compiler.intrinsics.base64.TestBase64.main([Ljava/lang/String;)V+116
v  ~StubRoutines::call_stub
j  
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
 java.base@18-internal
j  
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+133
 java.base@18-internal
j  
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
 java.base@18-internal
j  
java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59
 java.base@18-internal
j  com.sun.javatest.regtest.agent.MainWrapper$MainThread.run()V+172
j  java.lang.Thread.run()V+11 java.base@18-internal
v  ~StubRoutines::call_stub

siginfo: EXCEPTION_ACCESS_VIOLATION (0xc0000005), reading address 
0x00000000000000bc


Register to memory mapping:

RIP=0x00007ff92bcbd99e jvm.dll::ObjectMonitor::object_peek + 0xe
RAX=0x00000000000000ac is an unknown value
RBX=0x00000000000000ac is an unknown value
RCX=0x00000000000000ac is an unknown value
RDX=0x0 is NULL
RSP=0x00000060df6fcb50 is pointing into the stack for thread: 0x0000016c923de2c0
RBP=0x00000060df6fd110 is pointing into the stack for thread: 0x0000016c923de2c0
RSI=0x0000016c923de2c0 is a thread
RDI=0x0000016c923de2c0 is a thread
R8 =0x00000060df6fd1f0 is pointing into the stack for thread: 0x0000016c923de2c0
R9 =0x00000000000002f8 is an unknown value
R10=0x00007ff92b589800 jvm.dll::Runtime1::monitorenter + 0x0
R11=0x00000060df6fcc78 is pointing into the stack for thread: 0x0000016c923de2c0
R12=0x0 is NULL
R13=0x0000000000000200 is an unknown value
R14=0x0000000000000396 is an unknown value
R15=0x0000016c923de2c0 is a thread


Registers:
RAX=0x00000000000000ac, RBX=0x00000000000000ac, RCX=0x00000000000000ac, 
RDX=0x0000000000000000
RSP=0x00000060df6fcb50, RBP=0x00000060df6fd110, RSI=0x0000016c923de2c0, 
RDI=0x0000016c923de2c0
R8 =0x00000060df6fd1f0, R9 =0x00000000000002f8, R10=0x00007ff92b589800, 
R11=0x00000060df6fcc78
R12=0x0000000000000000, R13=0x0000000000000200, R14=0x0000000000000396, 
R15=0x0000016c923de2c0
RIP=0x00007ff92bcbd99e, EFLAGS=0x0000000000010206

Top of Stack: (sp=0x00000060df6fcb50)
0x00000060df6fcb50:   0000016c923de2c0 0000000000000000
0x00000060df6fcb60:   0000000000000000 00007ff92b8980a0
0x00000060df6fcb70:   0000016c923de2c0 00007ff92be48d5b
0x00000060df6fcb80:   00000000000000ac 000000074bd727d0
0x00000060df6fcb90:   0000000000000000 0000000000000000
0x00000060df6fcba0:   0000000000000000 00007ff92c1de2b0
0x00000060df6fcbb0:   0000016c923de2c0 00007ff92b8980a0
0x00000060df6fcbc0:   00000060df6fd1f0 00007ff92bd9b6f6
0x00000060df6fcbd0:   000000074bd727d0 0000016c923de2c0
0x00000060df6fcbe0:   00000060df6fd1f0 0000016c923de2c0
0x00000060df6fcbf0:   0000000000000000 0000000000000000
0x00000060df6fcc00:   0000000000000000 0000000000000000
0x00000060df6fcc10:   0000000000000000 0000000000000000
0x00000060df6fcc20:   000000074bd727d0 00007ff92b589894
0x00000060df6fcc30:   000000074bd727d0 00000060df6fd1f0
0x00000060df6fcc40:   0000016c923de2c0 00007ff92b8980a0 

Instructions: (pc=0x00007ff92bcbd99e)
0x00007ff92bcbd89e:   ff 48 8b c8 48 8b d8 48 8b 10 ff 52 48 48 8b 13
0x00007ff92bcbd8ae:   48 8b cb 84 c0 0f 84 83 00 00 00 ff 52 48 84 c0
0x00007ff92bcbd8be:   75 24 4c 8d 0d f1 7b 2e 00 ba 91 05 00 00 4c 8d
0x00007ff92bcbd8ce:   05 05 7c 2e 00 48 8d 0d c6 8b 2d 00 e8 71 aa a0
0x00007ff92bcbd8de:   ff e8 3c c3 01 00 8b 83 88 03 00 00 83 c0 fa a9
0x00007ff92bcbd8ee:   fd ff ff ff 74 23 4c 8d 0d c5 25 4f 00 41 b8 05
0x00007ff92bcbd8fe:   01 00 00 48 8d 15 e0 25 4f 00 b9 00 00 00 e0 e8
0x00007ff92bcbd90e:   4e a7 a0 ff e8 09 c3 01 00 48 8b 03 48 8b cb ff
0x00007ff92bcbd91e:   90 b8 00 00 00 84 c0 75 40 4c 8d 0d fa 25 4f 00
0x00007ff92bcbd92e:   ba 07 01 00 00 4c 8d 05 0e 26 4f 00 eb 1a ff 52
0x00007ff92bcbd93e:   40 84 c0 75 24 4c 8d 0d 7e 86 2d 00 ba 0b 01 00
0x00007ff92bcbd94e:   00 4c 8d 05 22 26 4f 00 48 8d 0d 8b 25 4f 00 e8
0x00007ff92bcbd95e:   ee a9 a0 ff e8 b9 c2 01 00 48 8b 44 24 30 48 8b
0x00007ff92bcbd96e:   48 10 48 85 c9 75 08 33 c0 48 83 c4 20 5b c3 48
0x00007ff92bcbd97e:   83 c4 20 5b 48 ff 25 8f f4 6f 00 cc cc cc cc cc
0x00007ff92bcbd98e:   cc cc 48 89 4c 24 08 48 83 ec 28 48 8b 44 24 30
0x00007ff92bcbd99e:   48 8b 48 10 48 85 c9 75 07 33 c0 48 83 c4 28 c3
0x00007ff92bcbd9ae:   48 83 c4 28 48 ff 25 5f f5 6f 00 cc cc cc cc cc
0x00007ff92bcbd9be:   cc cc 48 89 5c 24 18 48 89 54 24 10 48 89 4c 24
0x00007ff92bcbd9ce:   08 57 48 83 ec 20 48 8b 5c 24 30 48 8d 15 68 37
0x00007ff92bcbd9de:   4f 00 48 8b 7c 24 38 4c 8b c3 48 8b cf e8 50 8f
0x00007ff92bcbd9ee:   02 00 4c 8b 43 08 48 8d 15 6d 37 4f 00 48 8b cf
0x00007ff92bcbd9fe:   e8 3d 8f 02 00 48 8b 4b 10 48 85 c9 75 04 33 c0
0x00007ff92bcbda0e:   eb 06 ff 15 02 f5 6f 00 4c 8b c0 48 8d 15 60 37
0x00007ff92bcbda1e:   4f 00 48 8b cf e8 18 8f 02 00 48 8d 15 69 37 4f
0x00007ff92bcbda2e:   00 48 8b cf e8 09 8f 02 00 48 8d 15 6a 37 4f 00
0x00007ff92bcbda3e:   48 8b cf e8 fa 8e 02 00 48 8d 15 6b 37 4f 00 48
0x00007ff92bcbda4e:   8b cf e8 eb 8e 02 00 41 b8 2f 00 00 00 48 8d 15
0x00007ff92bcbda5e:   5e 37 4f 00 48 8b cf e8 d6 8e 02 00 48 8d 15 5f
0x00007ff92bcbda6e:   37 4f 00 48 8b cf e8 c7 8e 02 00 4c 8b 43 48 48
0x00007ff92bcbda7e:   8d 15 54 37 4f 00 48 8b cf e8 b4 8e 02 00 4c 8b
0x00007ff92bcbda8e:   43 50 48 8d 15 59 37 4f 00 48 8b cf e8 a1 8e 02 

Stack slot to memory mapping:
stack at sp + 0 slots: 0x0000016c923de2c0 is a thread
stack at sp + 1 slots: 0x0 is NULL
stack at sp + 2 slots: 0x0 is NULL
stack at sp + 3 slots: 0x00007ff92b8980a0 
jvm.dll::VMEntryWrapper::VMEntryWrapper + 0x110
stack at sp + 4 slots: 0x0000016c923de2c0 is a thread
stack at sp + 5 slots: 0x00007ff92be48d5b 
jvm.dll::ObjectSynchronizer::quick_enter + 0x9b
stack at sp + 6 slots: 0x00000000000000ac is an unknown value
stack at sp + 7 slots: 0x000000074bd727d0 is an oop: 
java.util.concurrent.ConcurrentHashMap$Node 
{0x000000074bd727d0} - klass: 'java/util/concurrent/ConcurrentHashMap$Node'
 - ---- fields (total size 4 words):
 - final 'hash' 'I' @12  683507634 (28bd7fb2)
 - final 'key' 'Ljava/lang/Object;' @16  "java.util.Base64"{0x000000074bd72788} 
(e97ae4f1)
 - volatile 'val' 'Ljava/lang/Object;' @20  a 
'java/lang/Object'{0x000000074bd727c0} (e97ae4f8)
 - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24  NULL (0)

-------------

PR: https://git.openjdk.java.net/jdk/pull/4368

Reply via email to