Re: [PR] Improve BytesRefHash.add performance by optimize rehash operation [lucene]

via GitHub Tue, 03 Mar 2026 07:38:33 -0800


mikemccand commented on PR #15779:
URL: https://github.com/apache/lucene/pull/15779#issuecomment-3991869954


   I tested the PR (pre-compute hash's), on a Raptorlake i9-13900K, 192 GB RAM, 
Arch Lijnux.
   
   I don't know what all the perf stats mean, but I see 1.4 -> 1.7 
CPUs_utilized changed:
   
   Before:
   
   ```
   38092046 terms loaded
   done shuffling
   Inserted 38092046 terms in 12691.45 ms, unique term 38092046
   Inserted 38092046 terms in 12688.31 ms, unique term 38092046
   Inserted 38092046 terms in 12607.45 ms, unique term 38092046
   Inserted 38092046 terms in 12537.87 ms, unique term 38092046
   
    Performance counter stats for '/usr/lib/jvm/java-25-openjdk/bin/java -cp 
.:lucene/core/build/classes/java/main25:lucene/core/build/classes/java/main BHT 
/lucenedata/enwiki/allterms-20110115.txt':
   
                 8560      context-switches                 #    108.1 cs/sec  
cs_per_second
                  287      cpu-migrations                   #      3.6 
migrations/sec  migrations_per_second
                33899      page-faults                      #    428.0 
faults/sec  page_faults_per_second
             79211.49 msec task-clock                       #      1.4 CPUs  
CPUs_utilized
           2273016325      cpu_core/L1-dcache-load-misses/  #      nan %  
l1d_miss_rate            (29.09%)
           1541526215      cpu_core/LLC-loads/              #     73.2 %  
llc_miss_rate            (13.84%)
           1405307756      cpu_core/branch-misses/          #      2.8 %  
branch_miss_rate         (20.77%)
          49549979427      cpu_core/branches/               #    625.5 M/sec  
branch_frequency     (27.68%)
         441832760975      cpu_core/cpu-cycles/             #      5.6 GHz  
cycles_frequency       (34.59%)
         283012369585      cpu_core/instructions/           #      0.6 
instructions  insn_per_cycle  (41.48%)
          83593449949      cpu_core/dTLB-loads/             #      0.1 %  
dtlb_miss_rate           (48.33%)
             88342760      cpu_atom/L1-icache-load-misses/  #      0.7 %  
l1i_miss_rate            (17.36%)
            128880274      cpu_atom/LLC-loads/              #      0.2 %  
llc_miss_rate            (11.44%)
             85602625      cpu_atom/branch-misses/          #      1.0 %  
branch_miss_rate         (8.54%)
           5128977191      cpu_atom/branches/               #     64.8 M/sec  
branch_frequency     (13.66%)
          61038440913      cpu_atom/cpu-cycles/             #      0.8 GHz  
cycles_frequency       (18.21%)
          34060988052      cpu_atom/instructions/           #      0.6 
instructions  insn_per_cycle  (22.70%)
          11407392575      cpu_atom/dTLB-loads/             #      0.0 %  
dtlb_miss_rate           (27.22%)
                TopdownL1 (cpu_core)                        #      8.5 %  
tma_bad_speculation
                                                            #     12.2 %  
tma_frontend_bound       (58.25%)
                                                            #     33.2 %  
tma_backend_bound
                                                            #     46.0 %  
tma_retiring             (58.25%)
                TopdownL1 (cpu_atom)                        #     81.9 %  
tma_backend_bound        (27.06%)
                                                            #      4.2 %  
tma_frontend_bound       (19.02%)
                                                            #     -6.3 %  
tma_bad_speculation
                                                            #     20.3 %  
tma_retiring             (17.47%)
   
         55.165335221 seconds time elapsed
   
         76.435898000 seconds user
          2.268276000 seconds sys
   ```
   
   After:
   
   ```
   8092046 terms loaded
   done shuffling
   Inserted 38092046 terms in 7715.29 ms, unique term 38092046
   Inserted 38092046 terms in 7696.81 ms, unique term 38092046
   Inserted 38092046 terms in 7704.62 ms, unique term 38092046
   Inserted 38092046 terms in 7586.43 ms, unique term 38092046
   
    Performance counter stats for '/usr/lib/jvm/java-25-openjdk/bin/java -cp 
/l/trunk:lucene/core/build/classes/java/main25:lucene/core/build/classes/java/main
 BHT /lucenedata/enwiki/allterms-20110115.txt':
   
                 8710      context-switches                 #    147.3 cs/sec  
cs_per_second
                  334      cpu-migrations                   #      5.6 
migrations/sec  migrations_per_second
                34616      page-faults                      #    585.4 
faults/sec  page_faults_per_second
             59128.21 msec task-clock                       #      1.7 CPUs  
CPUs_utilized
           1561004563      cpu_core/L1-dcache-load-misses/  #      nan %  
l1d_miss_rate            (27.12%)
            984009712      cpu_core/LLC-loads/              #     73.1 %  
llc_miss_rate            (14.29%)
           1341577909      cpu_core/branch-misses/          #      2.8 %  
branch_miss_rate         (21.76%)
          47205893532      cpu_core/branches/               #    798.4 M/sec  
branch_frequency     (28.98%)
         299702122270      cpu_core/cpu-cycles/             #      5.1 GHz  
cycles_frequency       (36.20%)
         274395763972      cpu_core/instructions/           #      0.9 
instructions  insn_per_cycle  (43.41%)
          85722776251      cpu_core/dTLB-loads/             #      0.1 %  
dtlb_miss_rate           (47.47%)
             61455978      cpu_atom/L1-icache-load-misses/  #      0.4 %  
l1i_miss_rate            (11.69%)
            165263411      cpu_atom/LLC-loads/              #      0.6 %  
llc_miss_rate            (8.66%)
            104379304      cpu_atom/branch-misses/          #      0.9 %  
branch_miss_rate         (6.72%)
          12291571297      cpu_atom/branches/               #    207.9 M/sec  
branch_frequency     (6.66%)
         123652422399      cpu_atom/cpu-cycles/             #      2.1 GHz  
cycles_frequency       (8.85%)
          77079071643      cpu_atom/instructions/           #      0.6 
instructions  insn_per_cycle  (11.01%)
          27617125715      cpu_atom/dTLB-loads/             #      0.0 %  
dtlb_miss_rate           (11.78%)
                TopdownL1 (cpu_core)                        #      8.5 %  
tma_bad_speculation
                                                            #     11.4 %  
tma_frontend_bound       (54.24%)
                                                            #     36.5 %  
tma_backend_bound
                                                            #     43.6 %  
tma_retiring             (54.24%)
                TopdownL1 (cpu_atom)                        #     80.3 %  
tma_backend_bound        (11.68%)
                                                            #      2.6 %  
tma_frontend_bound       (11.73%)
                                                            #      4.5 %  
tma_bad_speculation
                                                            #     12.5 %  
tma_retiring             (11.75%)
   
         35.300626738 seconds time elapsed
   
         56.434061000 seconds user
          2.299231000 seconds sys
   ```
   
   This is on latest Lucene `main` branch 
(#182ee9c4cc3bc52ace12e699248b750377a3aa2f) using your benchy (I just added 
code to load terms from a file one per line).  I tested on an export of terms 
from Wikipedia `en`:
   
   ```
   import org.apache.lucene.util.BytesRef;
   import org.apache.lucene.util.BytesRefHash;
   
   // /usr/lib/jvm/java-25-openjdk/bin/javac -cp 
lucene/core/build/classes/java/main25:lucene/core/build/classes/java/main 
BHT.java; perf stat -dd /usr/lib/jvm/java-25-openjdk/bin/java -cp 
.:lucene/core/build/classes/java/main2\
   5:lucene/core/build/classes/java/main BHT 
/lucenedata/enwiki/allterms-20110115.txt                                        
                                                                                
                       
   
   public class BHT {
     public static void main(String[] args) throws IOException {
       BytesRef[] terms = loadTerms(Paths.get(args[0]));
       for (int iter=0;iter<1;iter++) {
         insert(terms, 4);
       }
     }
   
     private static BytesRef[] loadTerms(Path path) throws IOException {
   
       final List<BytesRef> terms = new ArrayList<>();
       try (java.util.stream.Stream<String> lines = Files.lines(path)) {
         // Process each line as it is read                                     
                                                                                
                                                                    
         lines.forEach(line -> {
             terms.add(new BytesRef(line.trim()));
           });
       }
       System.out.println(terms.size() + " terms loaded");
       Collections.shuffle(terms);
       System.out.println("done shuffling");
       return terms.toArray(new BytesRef[0]);
     }
   
     private static void insert(BytesRef[] testData, int round) {
       for (int r = 0; r < round; r++) {
         BytesRefHash hash = new BytesRefHash();
         int uniqueCount = 0;
         long start = System.nanoTime();
         for (BytesRef ref : testData) {
           int pos = hash.add(ref);
           if (pos >= 0) {
             uniqueCount += 1;
           }
         }
         long insertTimeNs = System.nanoTime() - start;
         System.out.printf(
             "Inserted %d terms in %.2f ms, unique term %d\n",
             testData.length, insertTimeNs / 1_000_000.0, uniqueCount);
         /*                                                                     
                                                                                
                                                                    
         System.out.printf(                                                     
                                                                                
                                                                    
             "rehashTimes %d, rehashTimeMs %d, calcHashTimeMs %d\n",            
                                                                                
                                                                    
             hash.rehashTimes, hash.rehashTimeMs, hash.calcHashTimeMs);         
                                                                                
                                                                    
         */
       }
     }
   }
   ```
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Improve BytesRefHash.add performance by optimize rehash operation [lucene]

Reply via email to