[jira] [Comment Edited] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Uwe Schindler (Jira) Sun, 19 Sep 2021 14:25:04 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417402#comment-17417402
 ]


Uwe Schindler edited comment on LUCENE-10112 at 9/19/21, 9:24 PM:
------------------------------------------------------------------

Hi,

after reading the LZ4 java library on github I came to the same conclusion, see 
README @ https://github.com/lz4/lz4-java#compatibility-notes:

bq. Compressors might not generate the same compressed streams on all 
platforms, especially if CPU endianness differs, but the compressed streams can 
be safely decompressed by any decompressor implementation on any platform.

So for speed reasons I'd stay with the patch (but we can use 
`DataOutput#writeShort()`as suggested - because it is now defined to be little 
endian).

Alternative: If we want to change to LE in Lucene 9, then I would simply delay 
this issue until LUCENE-10113 is done and use the `BitUtil#VH_LE_INT` static 
field to access the reuseable var handle to do this.

So short question: be as fast as possible and allow different LZ4 output on 
different platforms? Or change to LE?

I would agree to first one, because we have no requirement to produce same 
index format for every platform endianness. We have at other places already 
some markers in data files (Robert found those in `Lucene90PostingsWriter`).


was (Author: thetaphi):
Hi,

after reading the LZ4 java library on github I came to the same conclusion, see 
README @ https://github.com/lz4/lz4-java#compatibility-notes:

> Compressors might not generate the same compressed streams on all platforms, 
> especially if CPU endianness differs, but the compressed streams can be 
> safely decompressed by any decompressor implementation on any platform.

So for speed reasons I'd stay with the patch (but we can use 
`DataOutput#writeShort()`as suggested - because it is now defined to be little 
endian).

Alternative: If we want to change to LE in Lucene 9, then I would simply delay 
this issue until LUCENE-10113 is done and use the `BitUtil#VH_LE_INT` static 
field to access the reuseable var handle to do this.

So short question: be as fast as possible and allow different LZ4 output on 
different platforms? Or change to LE?

I would agree to first one, because we have no requirement to produce same 
index format for every platform endianness. We have at other places already 
some markers in data files (Robert found those in `Lucene90PostingsWriter`).

> Improve LZ4 Compression performance with direct primitive read/writes
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-10112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10112
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Brooks
>            Assignee: Uwe Schindler
>            Priority: Minor
>         Attachments: LUCENE-10112.patch
>
>
> *Summary*
> Java9 introduced VarHandles as a tool to quickly read and write primitive 
> types directly to byte arrays without bound checks. The LZ4 compressor class 
> must consistently read ints from a byte array to analyze matches. The 
> performance can be improved by reading these using a VarHandle.
> Additionally, the LZ4 compressor/decompressor methods currently individually 
> read/write the bytes for LE shorts. Lucene's DataOutput/DataInput 
> abstractions already have dedicated methods for reading/writing LE shorts. 
> These methods are selectively optimized in certain implementations and will 
> provide superior performance than individual byte reads.
> *Concerns*
> The DataOutput/DataInput readShort() and writeShort() methods do not call out 
> that they are LE. It just looks to me that the DataOutput/DataInput are LE? 
> Since this particular change does not appear to provide significant 
> performance wins, maybe the patch is better leaving the explicit individual 
> byte reads?
> Additionally, this patch changes read ints to read them in the platform 
> native order which should be fine since it is just matching bytes. But I can 
> change it to only read in the order the previous version did.
> *Benchmarks*
> I created JMH benchmarks which compresses 1MB of highly compressible JSON 
> observability data. And compresses it 64KB at a time. In order to simulate 
> the "short" changes, I use a forked version `ByteArrayDataOutput` which 
> writes shorts using a VarHandle (to simulate fast writes that the ByteBuffer 
> versions would get.) I also ran a benchmark without the short changes, just 
> the reading ints using a VarHandle.
>  
>  
> {noformat}
> Benchmark                                          Mode  Cnt    Score   Error 
>  Units
> MyBenchmark.testCompressLuceneLZ4                 thrpt    9  712.430 ± 3.616 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4Forked           thrpt    9  945.380 ± 4.776 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4ForkedNoShort    thrpt    9  940.812 ± 3.868 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HC               thrpt    9  147.432 ± 4.730 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HCForked         thrpt    9  183.954 ± 2.534 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HCForkedNoShort  thrpt    9  188.065 ± 0.727 
>  ops/s{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Reply via email to