[
https://issues.apache.org/jira/browse/LUCENE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417480#comment-17417480
]
Uwe Schindler edited comment on LUCENE-10112 at 9/20/21, 6:52 AM:
------------------------------------------------------------------
bq. just to clarify, AFAIK using these varhandle methods doesn't do magic
"without bounds checks". just maybe less of them? e.g., if i want to read a
long out of a byte[], there really needs to only be one range check, instead of
eight.
There is another change: It uses one cpu instruction to load the 32bit integer
and it gets "atomic" (if concurrency would be important - not relevant here).
With the current code there are not only bounds checks removed, we explicitely
use one CPU instruction and do not rely on hotspot to combine them.
bq. I really think ByteOrder.nativeOrder() should be in forbidden APIs myself.
Would be a bad idea regading project Panama when you want to use native APIs
:-) I just say this :-)
bq. +1 to Adrien's idea to just use LE converter here. The problem is that no
jenkins is testing lucene on bigendian platforms: by using native order (even
if we think it is safe), it introduces risks. So personally I would rather see
explicit endianness.
Actually the current code uses BE, so it is not buggy. To me the risk was only
if we can read a BE compressed LZ4 stream when using an LE platform, which was
confirmed above. In short: I'd like to se a performance comparison of purely
the LZ4 decoder with LE and BE on x86 platform.
Theoretically we can test both encodings (this also makes sure that we can read
old BE encoded indexes) and at runtime use platform's endianess.
was (Author: thetaphi):
bq. just to clarify, AFAIK using these varhandle methods doesn't do magic
"without bounds checks". just maybe less of them? e.g., if i want to read a
long out of a byte[], there really needs to only be one range check, instead of
eight.
There is another change: It uses one cpu instruction to load the 32bit integer
and it gets "atomic" (if concurrency would be important - not relevant here).
With the current code there are not only bounds checks removed, we explicitely
use one CPU instruction and do not rely on hotspot to combine them.
bq. I really think ByteOrder.nativeOrder() should be in forbidden APIs myself.
Would be a bad idea regading project Panama when you want to use native APIs
:-) I just say this :-)
bq. +1 to Adrien's idea to just use LE converter here. The problem is that no
jenkins is testing lucene on bigendian platforms: by using native order (even
if we think it is safe), it introduces risks. So personally I would rather see
explicit endianness.
Actually the current code uses BE, so it is not buggy. To me the risk was only
if we can read a BE compressed LZ4 stream when using an LE platform, which was
confirmed above. In short: I'd like to se a performance comparison of purely
the LZ4 decoder with LE and BE on x86 platform.
> Improve LZ4 Compression performance with direct primitive read/writes
> ---------------------------------------------------------------------
>
> Key: LUCENE-10112
> URL: https://issues.apache.org/jira/browse/LUCENE-10112
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Tim Brooks
> Assignee: Uwe Schindler
> Priority: Minor
> Attachments: LUCENE-10112.patch
>
>
> *Summary*
> Java9 introduced VarHandles as a tool to quickly read and write primitive
> types directly to byte arrays without bound checks. The LZ4 compressor class
> must consistently read ints from a byte array to analyze matches. The
> performance can be improved by reading these using a VarHandle.
> Additionally, the LZ4 compressor/decompressor methods currently individually
> read/write the bytes for LE shorts. Lucene's DataOutput/DataInput
> abstractions already have dedicated methods for reading/writing LE shorts.
> These methods are selectively optimized in certain implementations and will
> provide superior performance than individual byte reads.
> *Concerns*
> The DataOutput/DataInput readShort() and writeShort() methods do not call out
> that they are LE. It just looks to me that the DataOutput/DataInput are LE?
> Since this particular change does not appear to provide significant
> performance wins, maybe the patch is better leaving the explicit individual
> byte reads?
> Additionally, this patch changes read ints to read them in the platform
> native order which should be fine since it is just matching bytes. But I can
> change it to only read in the order the previous version did.
> *Benchmarks*
> I created JMH benchmarks which compresses 1MB of highly compressible JSON
> observability data. And compresses it 64KB at a time. In order to simulate
> the "short" changes, I use a forked version `ByteArrayDataOutput` which
> writes shorts using a VarHandle (to simulate fast writes that the ByteBuffer
> versions would get.) I also ran a benchmark without the short changes, just
> the reading ints using a VarHandle.
>
>
> {noformat}
> Benchmark Mode Cnt Score Error
> Units
> MyBenchmark.testCompressLuceneLZ4 thrpt 9 712.430 ± 3.616
> ops/s
> MyBenchmark.testCompressLuceneLZ4Forked thrpt 9 945.380 ± 4.776
> ops/s
> MyBenchmark.testCompressLuceneLZ4ForkedNoShort thrpt 9 940.812 ± 3.868
> ops/s
> MyBenchmark.testCompressLuceneLZ4HC thrpt 9 147.432 ± 4.730
> ops/s
> MyBenchmark.testCompressLuceneLZ4HCForked thrpt 9 183.954 ± 2.534
> ops/s
> MyBenchmark.testCompressLuceneLZ4HCForkedNoShort thrpt 9 188.065 ± 0.727
> ops/s{noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]