[
https://issues.apache.org/jira/browse/LUCENE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Renaud Delbru updated LUCENE-10449:
-----------------------------------
Description:
LUCENE-9211 introduced a compression mechanism for binary doc values, which was
then removed at a later stage in LUCENE-9843 as it was impacting performance on
some workload.
However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of
reading the block directly from the {{IndexInput}} as in [1], the
{{decompressBlock()}} call [2] is kept which is decompressing a non-compress
block (from our understanding). The {{decompressBlock}} method deleguates to
{{LZ4.decompress }}and it looks like this is adding a significant overhead
(e.g., {{{}readByte{}}}).
This has quite an impact on our workloads which heavily uses doc values. It may
lead to perf regression from 2x up to 5x. See samples below.
{code:java}
❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
name type time_min time_max
time_p50 time_p90
7.10.2-22.6-SNAPSHOT.json total 42 90 45
66
7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1 14 32 15
18
7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23 53 27
43
❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
name type time_min time_max
time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json total 253 327 285
310
7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1 121 154 142
152
7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122 173 140
152
❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used
to bypass the binary compression
name type time_min time_max
time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json.2 total 48 96 63
75
7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1 19 44 25
31
7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23 42 29
37
❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
name type time_min time_max time_p50
time_p90
8.0.0-28.0-SNAPSHOT.json total 260 327 287
313
8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1 122 168 148
158
8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123 165 139
155{code}
We can clearly see that in Lucene 9.0, even after the removal of the binary doc
values compression, the performance didn't improve. Profiling the execution
indicates that the bottleneck is the {{{}LZ4.decompress{}}}. We have attached
two screenshots of a flamegraph.
The CPU time of the {{TermsDict.next}} method with Lucene 8.11 with no
compression is around 2 seconds, while the CPU time of the same method in
Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a
fixed number of times a binary doc values field. Each document is created with
a single binary value that represents a UUID.
[1]
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]
[2]
[https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]
was:
LUCENE-9211 introduced a compression mechanism for binary doc values, which was
then removed at a later stage in LUCENE-9843 as it was impacting performance on
some workload.
However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of
reading the block directly from the `IndexInput` as in [1], the
`decompressBlock()` call [2] is kept which is decompressing a non-compress
block (from our understanding). The `decompressBlock` method deleguates to
`LZ4.decompress` and it looks like this is adding a significant overhead (e.g.,
`readByte`).
This has quite an impact on our workloads which heavily uses doc values. It may
lead to perf regression from 2x up to 5x. See samples below.
{code:java}
❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
name type time_min time_max
time_p50 time_p90
7.10.2-22.6-SNAPSHOT.json total 42 90 45
66
7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1 14 32 15
18
7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23 53 27
43
❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
name type time_min time_max
time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json total 253 327 285
310
7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1 121 154 142
152
7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122 173 140
152
❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used
to bypass the binary compression
name type time_min time_max
time_p50 time_p90
7.17.0-27.1-SNAPSHOT.json.2 total 48 96 63
75
7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1 19 44 25
31
7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23 42 29
37
❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
name type time_min time_max time_p50
time_p90
8.0.0-28.0-SNAPSHOT.json total 260 327 287
313
8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1 122 168 148
158
8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123 165 139
155{code}
We can clearly see that in Lucene 9.0, even after the removal of the binary doc
values compression, the performance didn't improve. Profiling the execution
indicates that the bottleneck is the `LZ4.decompress`. We have attached two
screenshots of a flamegraph.
The CPU time of the `TermsDict.next` method with Lucene 8.11 with no
compression is around 2 seconds, while the CPU time of the same method in
Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a
fixed number of times a binary doc values field. Each document is created with
a single binary value that represents a UUID.
[1]
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]
[2]
[https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]
> Unnecessary ByteArrayDataInput introduced with compression on binary doc
> values introduced
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-10449
> URL: https://issues.apache.org/jira/browse/LUCENE-10449
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/codecs
> Affects Versions: 9.0
> Reporter: Renaud Delbru
> Priority: Major
> Attachments: lucene-8.11-no-compression.png, lucene-9.png
>
>
> LUCENE-9211 introduced a compression mechanism for binary doc values, which
> was then removed at a later stage in LUCENE-9843 as it was impacting
> performance on some workload.
> However, LUCENE-9843 didn't revert the code as it was prior to that. Instead
> of reading the block directly from the {{IndexInput}} as in [1], the
> {{decompressBlock()}} call [2] is kept which is decompressing a non-compress
> block (from our understanding). The {{decompressBlock}} method deleguates to
> {{LZ4.decompress }}and it looks like this is adding a significant overhead
> (e.g., {{{}readByte{}}}).
> This has quite an impact on our workloads which heavily uses doc values. It
> may lead to perf regression from 2x up to 5x. See samples below.
>
> {code:java}
> ❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
> name type time_min time_max
> time_p50 time_p90
> 7.10.2-22.6-SNAPSHOT.json total 42 90 45
> 66
> 7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1 14 32 15
> 18
> 7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23 53 27
> 43
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
> name type time_min time_max
> time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json total 253 327 285
> 310
> 7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1 121 154 142
> 152
> 7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122 173 140
> 152
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is
> used to bypass the binary compression
> name type time_min time_max
> time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json.2 total 48 96 63
> 75
> 7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1 19 44 25
> 31
> 7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23 42 29
> 37
> ❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
> name type time_min time_max
> time_p50 time_p90
> 8.0.0-28.0-SNAPSHOT.json total 260 327 287
> 313
> 8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1 122 168 148
> 158
> 8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123 165 139
> 155{code}
> We can clearly see that in Lucene 9.0, even after the removal of the binary
> doc values compression, the performance didn't improve. Profiling the
> execution indicates that the bottleneck is the {{{}LZ4.decompress{}}}. We
> have attached two screenshots of a flamegraph.
> The CPU time of the {{TermsDict.next}} method with Lucene 8.11 with no
> compression is around 2 seconds, while the CPU time of the same method in
> Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a
> fixed number of times a binary doc values field. Each document is created
> with a single binary value that represents a UUID.
>
> [1]
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]
> [2]
> [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]