Re: RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Lance Andersen Thu, 23 Apr 2020 04:26:26 -0700

Hi Volker

> On Apr 22, 2020, at 4:08 PM, Volker Simonis <[email protected]> wrote:
> 
> On Tue, Apr 21, 2020 at 5:23 PM Lance Andersen
> <[email protected]> wrote:
>> 
>> Hi Volker,
>> 
>> I think overall this looks OK.  I went through the older SCCS histories to 
>> see if I could figure out why they were using 512 for the input length but 
>> could not find anything that might shed some light for me.
>> 
> 
> Hi Lance,
> 
> thanks a lot for digging in the old sources to review my change. It's
> great that we stil have people who can use SCCS :)


:-)
> 
>> I am not sure you can guarantee that src.zip exists but others might be able 
>> to weigh in here.  What we have been trying to do going forward is to have 
>> the tests create  the zip files that it needs.  In some cases, we have 
>> created a byte array within the test which represents the zip and just write 
>> it out before the test begins.
>> 
> 
> Yes, the dependency on an external file was not nice, so I rewrote the
> benchmark to programmatically create a file which can be compressed by
> a factor of ~6:
> 
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.02/
> 
> Notice that this new version only changes the microbenchmark, all the
> other files are untouched.
> 
> As everybody seemed to be happy with the change itself and the
> regression test, I'm now waiting for your and Clae's final review of
> the microbenchmark before pushing. Please let me know hat you think?

I think you are good to go from my POV.

Thank you for your contribution to improving Zip performance.

Best,
Lance
> 
> Best regards,
> Volker
> 
>> I am hoping others with more history might also chime in case they are aware 
>> of the history here.
>> 
>> Thank you for helping improve the performance.
>> 
>> Best
>> Lance
>> 
>> On Apr 17, 2020, at 6:49 AM, Volker Simonis <[email protected]> wrote:
>> 
>> Thanks everybody for looking at this change!
>> 
>> Please find an updated webrev (with a new test and micro-benchmark)
>> and my answers to your comments below:
>> 
>> http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.01/
>> 
>> On Thu, Apr 16, 2020 at 6:40 AM Vyom Tiwari <[email protected]> wrote:
>> 
>> Thanks for doing this, i think the below code change is not required .
>> 
>> Please do let me know if i am not reading it correctly.
>> 
>> if (inf.finished() || (len == 0)/* no more input */) {
>> 
>> If you check the native code "inf.finished() will be true only of the 
>> corresponding native call inflate(strm, Z_PARTIAL_FLUSH) returns 
>> "Z_STREAM_END".
>> 
>> After your code change  write may return even if finished() is not true.
>> 
>> 
>> Yes, that's true, but that's what we must do if there's no more input
>> available. Before my change this break on "len < 1" was in the "if
>> (inf.needsInput())" branch.
>> 
>> On Thu, Apr 16, 2020 at 8:22 AM Thomas Stüfe <[email protected]> wrote:
>> 
>> 252                 // Check the decompressor
>> 253                 if (inf.finished() || (len == 0)/* no more input */) {
>> 254                     break;
>> 255                 }
>> 
>> Not sure but I think this is wrong because now you bypass the followup 
>> handling of inf.needsDirectory.
>> 
>> Inflater.inflate() returns 0 if either needsInput or needsDirectory. We have 
>> to ignore needsInput since we have no input anymore, but needsDirectory has 
>> to be handled, no?
>> 
>> 
>> You're absolutely right Thomas. Thanks for catching this! I've moved
>> the check for "needsDictionary" in front of the "finished() || len ==
>> 0" check.
>> 
>> Unfortunately there is not very good test coverage for zip with preset
>> dictionaries (jtreg and submit repo passed without problems). So I
>> added a new test for this use case to "
>> test/jdk/java/util/zip/DeflateIn_InflateOut.java".
>> 
>> On Thu, Apr 16, 2020 at 8:48 AM Thomas Stüfe <[email protected]> wrote:
>> 
>> 
>> As for increasing the buffer size, it makes sense but IMHO needs a CSR and a 
>> release note.
>> 
>> 
>> I don't think so. This is an internal, implementation specific setting
>> which has never been exposed or documented before so I don't see why
>> we should document it now. But let's discuss this separately in the
>> corresponding JBS issue (see below).
>> 
>> On Thu, Apr 16, 2020 at 1:18 PM Claes Redestad
>> <[email protected]> wrote:
>> 
>> 
>> Hi Volker,
>> 
>> On 2020-04-15 19:48, Volker Simonis wrote:
>> 
>> While doing this change, I've also realized that all the streams in
>> java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
>> GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
>> internal byte buffer of 512 bytes by default. Looking at the benchmark
>> attached to JDK-8242864, I think that increasing this default to a
>> bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
>> read and write operations on these streams when they are created with
>> the default buffer size. I think the value "512" is a legacy of old
>> times when memory was more precious:)  so  I've opened "JDK-8242864:
>> Increase the default, internal buffer size of the Streams in
>> java.util.zip" to track that as as separate issue. Do you think it
>> makes sense to increase that default value?
>> 
>> 
>> Seems reasonable. 8192 seems to be the buffer size we've been converging
>> on elsewhere (see InputStream, BufferedInputStream, Files, ..). I also
>> 
>> 
>> That seems reasonable. Alan commented on the JBS issue so we can
>> continue the discussion there.
>> 
>> found an instance of 8096, which is either a typo or a clever
>> optimization to keep the array + array object header fit snugly within
>> 8Kb. You chose. :-)
>> 
>> 
>> I like how you try to be positive :)
>> 
>> 
>> Thank you and best regards,
>> Volker
>> 
>> PS: do you think it makes sense to contribute the benchmark attached
>> to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
>> 
>> [1]http://openjdk.java.net/projects/code-tools/jmh-jdk-microbenchmarks/
>> 
>> 
>> I'd definitely welcome the micro as part of the patch under
>> test/micro/org/openjdk/bench/java/util/zip - additionally contributing
>> 
>> 
>> I knew that "jmh-jdk-microbenchmarks" has been copied to the jdk repo
>> but somehow I did found "jmh-jdk-microbenchmarks" before looking in
>> the obvious place :)
>> 
>> So I've added the benchmark to the patch now. There's one thing I'm
>> not sure however. The benchmark requires a "big" (several 100k) with
>> good compression ratio (e.g. a large text file). I've decided to use a
>> big Java source file from "src.zip" but I'm not sure if "src.zip" is
>> always available in the jdk images which are used to run the
>> microbenchmarks. Do you think the test it is fine this way or do you
>> have a better idea? I saw that "ZipFind" uses "microbenchmarks.jar"
>> (i.e. the container of the test itself) but that file is already
>> compressed so the compression rate won't be that good.
>> 
>> Another thing I couldn't figure out is a good way to skip the
>> benchmark when I realize that I can't load the expected file in the
>> "@Setup" method. Do you now anything better than just throwing an
>> exception?
>> 
>> Thank you and best regards,
>> Volker
>> 
>> to jmh-jdk-microbenchmarks could enable you to test the micro on 8 or
>> 11.
>> 
>> /Claes
>> 
>> 
>> 
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> [email protected]
>> 
>> 
>> 

 <http://oracle.com/us/design/oracle-email-sig-198324.gif>
 <http://oracle.com/us/design/oracle-email-sig-198324.gif> 
<http://oracle.com/us/design/oracle-email-sig-198324.gif>
 <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen| 
Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
[email protected] <mailto:[email protected]>

Re: RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Reply via email to