[jira] [Commented] (LUCENE-10449) Performance regression due to decompressBlock method introduced with compression on binary doc values

Renaud Delbru (Jira) Wed, 02 Mar 2022 06:18:13 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500191#comment-17500191
 ]


Renaud Delbru commented on LUCENE-10449:
----------------------------------------

Hi [~rcmuir] 

Thanks for your observation and advice. 

After considering your comment, we realized that our description was a bit 
confusing and incorrect (I'll update the title and description). The 
performance regression we observed is on the SortedSetDocValues and more 
specifically on the term dictionary code. At the moment, there isn't an option 
for us to switch to BinaryDocValues as you suggested. Elasticsearch is using 
SortedSetDocValues for keyword field type which is used by default for many 
fields, like IDs.

We are aware that our workload is particular, and it may not be common in the 
"search" domain. Our need in terms of data access pattern is to scan and 
retrieve a large number of keyword (binary) values. We see that the change to 
force LZ4 compression in the terms dict is quite impactful for such a data 
access pattern. The implementation without LZ4, while not optimal (I agree with 
you that a BinaryDocValues approach would be better), was relatively fast for 
our needs. However, we are now confronted to a significant perf impact with 
9843.

We understand that reverting back to a design with compression option is not an 
option. A potential solution would be optimize the LZ4 compression logic to 
minimize the impact. We are not sure if this is entirely feasible, but we were 
wondering if the compression logic could detect if there was no actual 
compression done on the terms dict block, e.g., when uncompressedLength == 
compressedLength in 
[https://github.com/apache/lucene/blob/a7a02519f0a5652110a186f4909347ac3349092d/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L597,]
 and in that case fall back to a simpler decoding logic.

Another solution would be like the one you propose, i.e., implementing a 
specific field type on top of the BinaryDocValues (this is what was done for 
vector and wildcard field type) which would support the encoding of multiple 
IDs in one binary value. However, this would impact usability as a user would 
not have the best perf out of the box, he will have to carefully define is 
schema beforehand. In addition, this is more an issue regarding Elasticsearch 
than Lucene.

> Performance regression due to decompressBlock method introduced with 
> compression on binary doc values 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10449
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10449
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 9.0
>            Reporter: Renaud Delbru
>            Priority: Major
>         Attachments: lucene-8.11-no-compression.png, lucene-9.png
>
>
> LUCENE-9211 introduced a compression mechanism for binary doc values, which 
> was then removed at a later stage in LUCENE-9843 as it was impacting 
> performance on some workload.
> However, LUCENE-9843 didn't revert the code as it was prior to that. Instead 
> of reading the block directly from the {{IndexInput}} as in [1], the 
> {{decompressBlock()}} call [2] is kept which is decompressing a non-compress 
> block (from our understanding). The {{decompressBlock}} method deleguates to 
> {{LZ4.decompress }}and it looks like this is adding a significant overhead 
> (e.g., {{{}readByte{}}}).
> This has quite an impact on our workloads which heavily uses doc values. It 
> may lead to perf regression from 2x up to 5x. See samples below.
>  
> {code:java}
> ❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression
> name                      type                        time_min time_max 
> time_p50 time_p90
> 7.10.2-22.6-SNAPSHOT.json total                       42       90       45    
>    66
> 7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1          14       32       15    
>    18
> 7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23       53       27    
>    43
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression
> name                      type                        time_min time_max 
> time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json total                       253      327      285   
>    310
> 7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1          121      154      142   
>    152
> 7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122      173      140   
>    152
> ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is 
> used to bypass the binary compression 
> name                        type                        time_min time_max 
> time_p50 time_p90
> 7.17.0-27.1-SNAPSHOT.json.2 total                       48       96       63  
>      75
> 7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1          19       44       25  
>      31
> 7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23       42       29  
>      37
> ❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression
> name                     type                        time_min time_max 
> time_p50 time_p90
> 8.0.0-28.0-SNAPSHOT.json total                       260      327      287    
>   313
> 8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1          122      168      148    
>   158
> 8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123      165      139    
>   155{code}
> We can clearly see that in Lucene 9.0, even after the removal of the binary 
> doc values compression, the performance didn't improve. Profiling the 
> execution indicates that the bottleneck is the {{{}LZ4.decompress{}}}. We 
> have attached two screenshots of a flamegraph.
> The CPU time of the {{TermsDict.next}} method with Lucene 8.11 with no 
> compression is around 2 seconds, while the CPU time of the same method in 
> Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a 
> fixed number of times a binary doc values field. Each document is created 
> with a single binary value that represents a UUID. 
>  
> [1] 
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159]
> [2] 
> [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-10449) Performance regression due to decompressBlock method introduced with compression on binary doc values

Reply via email to