[
https://issues.apache.org/jira/browse/LUCENE-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417273#comment-17417273
]
Tim Brooks commented on LUCENE-10112:
-------------------------------------
> Maybe provide a pull request to make review easier.
I will look into doing this soon.
> Unrelated to your current patch regarding LZ4: The ByteArrayDataInput and
>ByteArrayDataOutput classes could use the same trick with VarHandles for all
>data types (short, int, long), but there with ByteOrder.LITTLE_ENDIAN (in
>Lucene 9+). This may be an interesting change, too.
Yeah. I have prototyped a patch related to this too that I will submit in a few
days. Just wanted to started with LZ4 as it is an area I have been focused on
recently related to Elasticsearch. And reading a bunch of ints without bound
checks was a significant performance improvement I had noticed with the
lz4-java library.
I'm not sure how common ByteArrayDataInput are used in Lucene. A lot of the
flame graphs I look at in Elasticsearch are indexing into the ByteBuffer
variants (and byte buffers internally already have some of these
optimizations). But since Lucene supports Java9 on 9 there is no reason not to
do the VarHandles for byte arrays.
> Improve LZ4 Compression performance with direct primitive read/writes
> ---------------------------------------------------------------------
>
> Key: LUCENE-10112
> URL: https://issues.apache.org/jira/browse/LUCENE-10112
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Tim Brooks
> Priority: Minor
> Attachments: LUCENE-10112.patch
>
>
> *Summary*
> Java9 introduced VarHandles as a tool to quickly read and write primitive
> types directly to byte arrays without bound checks. The LZ4 compressor class
> must consistently read ints from a byte array to analyze matches. The
> performance can be improved by reading these using a VarHandle.
> Additionally, the LZ4 compressor/decompressor methods currently individually
> read/write the bytes for LE shorts. Lucene's DataOutput/DataInput
> abstractions already have dedicated methods for reading/writing LE shorts.
> These methods are selectively optimized in certain implementations and will
> provide superior performance than individual byte reads.
> *Concerns*
> The DataOutput/DataInput readShort() and writeShort() methods do not call out
> that they are LE. It just looks to me that the DataOutput/DataInput are LE?
> Since this particular change does not appear to provide significant
> performance wins, maybe the patch is better leaving the explicit individual
> byte reads?
> Additionally, this patch changes read ints to read them in the platform
> native order which should be fine since it is just matching bytes. But I can
> change it to only read in the order the previous version did.
> *Benchmarks*
> I created JMH benchmarks which compresses 1MB of highly compressible JSON
> observability data. And compresses it 64KB at a time. In order to simulate
> the "short" changes, I use a forked version `ByteArrayDataOutput` which
> writes shorts using a VarHandle (to simulate fast writes that the ByteBuffer
> versions would get.) I also ran a benchmark without the short changes, just
> the reading ints using a VarHandle.
>
>
> {noformat}
> Benchmark Mode Cnt Score Error
> Units
> MyBenchmark.testCompressLuceneLZ4 thrpt 9 712.430 ± 3.616
> ops/s
> MyBenchmark.testCompressLuceneLZ4Forked thrpt 9 945.380 ± 4.776
> ops/s
> MyBenchmark.testCompressLuceneLZ4ForkedNoShort thrpt 9 940.812 ± 3.868
> ops/s
> MyBenchmark.testCompressLuceneLZ4HC thrpt 9 147.432 ± 4.730
> ops/s
> MyBenchmark.testCompressLuceneLZ4HCForked thrpt 9 183.954 ± 2.534
> ops/s
> MyBenchmark.testCompressLuceneLZ4HCForkedNoShort thrpt 9 188.065 ± 0.727
> ops/s{noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]