Hi Roger,
Given the concern of the possible incompatible behavior change of over
reading bytes
from the underlying stream. I decided to give up last proposed changes
for DecInputStream
for now. With some "minor" cleanup and tuning I still have about 10%+
improvement with
various input size sampling. Let's address the decoder stream in
separate rfe later, if desired.
The updated webrev and the jmh results are at
http://cr.openjdk.java.net/~sherman/8164278/webrev
http://cr.openjdk.java.net/~sherman/8164278/base64.bm
Last version of webrev and corresponding jmh result can be found at
http://cr.openjdk.java.net/~sherman/8164278/webrev.02
http://cr.openjdk.java.net/~sherman/8164278/base64.bm.old
Thanks,
Sherman
On 2/5/18, 6:00 PM, Xueming Shen wrote:
Hi,
Please help review the change for JDK-8164278.
issue: https://bugs.openjdk.java.net/browse/JDK-8164278
webrev: http://cr.openjdk.java.net/~sherman/8164278/webrev
jmh.src: http://cr.openjdk.java.net/~sherman/8164278/Base64BM.java
jmh.result: http://cr.openjdk.java.net/~sherman/8164278/base64.bm
Base64.Decoder.decode0:
Adopted the "similar" optimization approach we took in
Base64.Encoder.encode0()
to add a "fast path" to decode a block of 4-byte units together
(current implementation
decodes one single byte per while/loop. The jmh benchmark result
indicates a big speed
boost (those decodeArray/Mime/Url results, from 30% to 2 times
faster, depends on
input size).
Base64.Encoder.encode0()
It appears encode0() was fully optimized in 1.8. Can't get it
faster :-) Tried to use
Unsafe.getLong/putLong instead of byte by byte access. But it
appears the 8-byte
"vectorization" does not bring us enough speed up, the performance
is the same as the
current one. See encode00() at
http://cr.openjdk.java.net/~sherman/8164278/webrev.00
Base64.Encoder.wrap(OutputStream)/EncOutputStream.write():
If my memory serves me right, the current implementation was under
the assumption that
the underlying output stream probably is buffered (if invoker
cares). It would be a redundant
if EncOutputStream buffers bytes again. It appears this is a wrong
assumption. It is much
slower to write 4 bytes separately, compared to bundle them
together in a byte[4] and write
into underlying, even the underlying output stream is a
ByteArrayOutputStream.
Again, the proposed change is to add a fast path loop, as we do in
encode0(), to decode/
write a block of 3-byte->4-byte unites. It appears this fast loop
can help the compiler to
optimize away some boundary checks, therefor is much faster.
The jmh result Base64BM.encodeOS suggests the new implementation
is almost 4 times faster
and is almost the same as java.mail's stream encoder.
Base64.Decoder.wrap(InputStream)/DecInputStream.read():
Same as the approach we take for decode0(), to add a fast path
decode block of 4-byte unites
together.
The jmh result Base64BM.decodeOS (the name probably should be
decodeIS, for InputStream,
but anyway...) shows the proposed one is 4 times faster than the
existing impl and double
the java.mail (Base64BM.decodeOS_javamail) implementation.
However, there is a side-effect of adding a buffering mechanism
into DecInputStream. The
current implementation read bytes from the underlying stream one
by one, it never reads
more bytes than it needs, which means it should/is supposed to
just stop at the last byte
that it needs to decode, when there is "=" present in the stream.
With buffering, it's possible more bytes (after "=", which
indicates "end of base64 stream") might
be read/consumed in and buffered. A concern? if this is indeed a
concern, the only alternative
might be to add a separate method to support this
"faster-buffered-decoder"?
Thanks,
Sherman