[jira] [Updated] (LUCENE-10572) Can we optimize BytesRefHash?

Michael McCandless (Jira) Fri, 13 May 2022 12:14:05 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-10572:
----------------------------------------
    Description: 
I was poking around in our nightly benchmarks 
([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
profiling that the hottest method is this:
{noformat}
PERCENT       CPU SAMPLES   STACK
9.28%         53848         org.apache.lucene.util.BytesRefHash#equals()
                              at org.apache.lucene.util.BytesRefHash#findHash()
                              at org.apache.lucene.util.BytesRefHash#add()
                              at org.apache.lucene.index.TermsHashPerField#add()
                              at 
org.apache.lucene.index.IndexingChain$PerField#invert()
                              at 
org.apache.lucene.index.IndexingChain#processField()
                              at 
org.apache.lucene.index.IndexingChain#processDocument()
                              at 
org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
This is kinda crazy – comparing if the term to be inserted into the inverted 
index hash equals the term already added to {{BytesRefHash}} is the hottest 
method during nightly benchmarks.

Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
questionable things about our current implementation:
 * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the inserted 
term into the hash?  Let's just use two bytes always, since IW limits term 
length to 32 K (< 64K that an unsigned short can cover)

 * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
(BitUtil.VH_BE_SHORT.get)


 * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
aggressive enough?  Or the initial sizing of the hash is too small?

 * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too many 
{{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible "upgrades"?

 * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
({{{}murmurhash3_x86_32{}}})?

 * Are we using the JVM's intrinsics to compare multiple bytes in a single SIMD 
instruction ([~rcmuir] is quite sure we are indeed)?

 * [~jpountz] suggested maybe the hash insert is simply memory bound

 * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total CPU 
cost)

I pulled these observations from a recent (5/6/22) profiler output: 
[https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]

Maybe we can improve our performance on this crazy hotspot?

Or maybe this is a "healthy" hotspot and we should leave it be!

  was:
I was poking around in our nightly benchmarks 
([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
profiling that the hottest method is this:
{noformat}
PERCENT       CPU SAMPLES   STACK
9.28%         53848         org.apache.lucene.util.BytesRefHash#equals()
                              at org.apache.lucene.util.BytesRefHash#findHash()
                              at org.apache.lucene.util.BytesRefHash#add()
                              at org.apache.lucene.index.TermsHashPerField#add()
                              at 
org.apache.lucene.index.IndexingChain$PerField#invert()
                              at 
org.apache.lucene.index.IndexingChain#processField()
                              at 
org.apache.lucene.index.IndexingChain#processDocument()
                              at 
org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
This is kinda crazy – comparing if the term to be inserted into the inverted 
index hash equals the term already added to {{BytesRefHash}} is the hottest 
method during nightly benchmarks.

Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
questionable things about our current implementation:
 * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the inserted 
term into the hash?  Let's just use two bytes always, since IW limits term 
length to 32 K (< 64K that an unsigned short can cover)


 * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
(BitUtil.VH_BE_SHORT.get)
 * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
aggressive enough?  Or the initial sizing of the hash is too small?

 * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too many 
{{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible "upgrades"?


 * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
({{{}murmurhash3_x86_32{}}})?

 * Are we using the JVM's intrinsics to compare multiple bytes in a single SIMD 
instruction ([~rcmuir] is quite sure we are indeed)?


 * [~jpountz] suggested maybe the hash insert is simply memory bound


 * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total CPU 
cost)

I pulled these observations from a recent (5/6/22) profiler output: 
[https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]

Maybe we can improve our performance on this crazy hotspot?

Or maybe this is a "healthy" hotspot and we should leave it be!


> Can we optimize BytesRefHash?
> -----------------------------
>
>                 Key: LUCENE-10572
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10572
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Priority: Major
>
> I was poking around in our nightly benchmarks 
> ([https://home.apache.org/~mikemccand/lucenebench]) and noticed in the JFR 
> profiling that the hottest method is this:
> {noformat}
> PERCENT       CPU SAMPLES   STACK
> 9.28%         53848         org.apache.lucene.util.BytesRefHash#equals()
>                               at 
> org.apache.lucene.util.BytesRefHash#findHash()
>                               at org.apache.lucene.util.BytesRefHash#add()
>                               at 
> org.apache.lucene.index.TermsHashPerField#add()
>                               at 
> org.apache.lucene.index.IndexingChain$PerField#invert()
>                               at 
> org.apache.lucene.index.IndexingChain#processField()
>                               at 
> org.apache.lucene.index.IndexingChain#processDocument()
>                               at 
> org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments() {noformat}
> This is kinda crazy – comparing if the term to be inserted into the inverted 
> index hash equals the term already added to {{BytesRefHash}} is the hottest 
> method during nightly benchmarks.
> Discussing offline with [~rcmuir] and [~jpountz] they noticed a few 
> questionable things about our current implementation:
>  * Why are we using a 1 or 2 byte {{vInt}} to encode the length of the 
> inserted term into the hash?  Let's just use two bytes always, since IW 
> limits term length to 32 K (< 64K that an unsigned short can cover)
>  * Why are we doing byte swapping in this deep hotspot using {{VarHandles}} 
> (BitUtil.VH_BE_SHORT.get)
>  * Is it possible our growth strategy for {{BytesRefHash}} (on rehash) is not 
> aggressive enough?  Or the initial sizing of the hash is too small?
>  * Maybe {{MurmurHash}} is not great (causing too many conflicts, and too 
> many {{equals}} calls as a result?) – {{Fnv}} and {{xxhash}} are possible 
> "upgrades"?
>  * If we stick with {{{}MurmurHash{}}}, why are we using the 32 bit version 
> ({{{}murmurhash3_x86_32{}}})?
>  * Are we using the JVM's intrinsics to compare multiple bytes in a single 
> SIMD instruction ([~rcmuir] is quite sure we are indeed)?
>  * [~jpountz] suggested maybe the hash insert is simply memory bound
>  * {{TermsHashPerField.writeByte}} is also depressingly slow (~5% of total 
> CPU cost)
> I pulled these observations from a recent (5/6/22) profiler output: 
> [https://home.apache.org/~mikemccand/lucenebench/2022.05.06.06.33.00.html]
> Maybe we can improve our performance on this crazy hotspot?
> Or maybe this is a "healthy" hotspot and we should leave it be!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10572) Can we optimize BytesRefHash?

Reply via email to