[jira] [Commented] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Uwe Schindler (Jira) Sun, 19 Sep 2021 23:52:04 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417480#comment-17417480
 ]


Uwe Schindler commented on LUCENE-10112:
----------------------------------------

bq. just to clarify, AFAIK using these varhandle methods doesn't do magic 
"without bounds checks". just maybe less of them? e.g., if i want to read a 
long out of a byte[], there really needs to only be one range check, instead of 
eight.

There is another change: It uses one cpu instruction to load the 32bit integer 
and it gets "atomic" (if concurrency would be important - not relevant here). 
With the current code there are not only bounds checks removed, we explicitely 
use one CPU instruction and do not rely on hotspot to combine them.

bq. I really think ByteOrder.nativeOrder() should be in forbidden APIs myself. 

Would be a bad idea regading project Panama when you want to use native APIs 
:-) I just say this :-)

bq. +1 to Adrien's idea to just use LE converter here. The problem is that no 
jenkins is testing lucene on bigendian platforms: by using native order (even 
if we think it is safe), it introduces risks. So personally I would rather see 
explicit endianness.

Actually the current code uses BE, so it is not buggy. To me the risk was only 
if we can read a BE compressed LZ4 stream when using an LE platform, which was 
confirmed above. In short: I'd like to se a performance comparison of purely 
the LZ4 decoder with LE and BE on x86 platform.

> Improve LZ4 Compression performance with direct primitive read/writes
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-10112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10112
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Brooks
>            Assignee: Uwe Schindler
>            Priority: Minor
>         Attachments: LUCENE-10112.patch
>
>
> *Summary*
> Java9 introduced VarHandles as a tool to quickly read and write primitive 
> types directly to byte arrays without bound checks. The LZ4 compressor class 
> must consistently read ints from a byte array to analyze matches. The 
> performance can be improved by reading these using a VarHandle.
> Additionally, the LZ4 compressor/decompressor methods currently individually 
> read/write the bytes for LE shorts. Lucene's DataOutput/DataInput 
> abstractions already have dedicated methods for reading/writing LE shorts. 
> These methods are selectively optimized in certain implementations and will 
> provide superior performance than individual byte reads.
> *Concerns*
> The DataOutput/DataInput readShort() and writeShort() methods do not call out 
> that they are LE. It just looks to me that the DataOutput/DataInput are LE? 
> Since this particular change does not appear to provide significant 
> performance wins, maybe the patch is better leaving the explicit individual 
> byte reads?
> Additionally, this patch changes read ints to read them in the platform 
> native order which should be fine since it is just matching bytes. But I can 
> change it to only read in the order the previous version did.
> *Benchmarks*
> I created JMH benchmarks which compresses 1MB of highly compressible JSON 
> observability data. And compresses it 64KB at a time. In order to simulate 
> the "short" changes, I use a forked version `ByteArrayDataOutput` which 
> writes shorts using a VarHandle (to simulate fast writes that the ByteBuffer 
> versions would get.) I also ran a benchmark without the short changes, just 
> the reading ints using a VarHandle.
>  
>  
> {noformat}
> Benchmark                                          Mode  Cnt    Score   Error 
>  Units
> MyBenchmark.testCompressLuceneLZ4                 thrpt    9  712.430 ± 3.616 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4Forked           thrpt    9  945.380 ± 4.776 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4ForkedNoShort    thrpt    9  940.812 ± 3.868 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HC               thrpt    9  147.432 ± 4.730 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HCForked         thrpt    9  183.954 ± 2.534 
>  ops/s
> MyBenchmark.testCompressLuceneLZ4HCForkedNoShort  thrpt    9  188.065 ± 0.727 
>  ops/s{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-10112) Improve LZ4 Compression performance with direct primitive read/writes

Reply via email to