ctubbsii commented on PR #3549:
URL: https://github.com/apache/accumulo/pull/3549#issuecomment-1612819859

   For what it's worth, I got substantially wider error bars a few times when I 
ran the gc-test-update branch (at commit 
b27d9fa56b68e7ab58a9e6c4d9b894ae35724c80) locally. Each run took about 10 
minutes (using `mvn clean verify -Dit.test=GarbageCollectorPerformanceIT 
-Dversion.accumulo=2.1.2-SNAPSHOT`), and all were very spammy with regard to 
hadoop log messages about decompressors.
   
   Click each to expand/collapse details.
   
   <details>
   <summary>Current 2.1.2-SNAPSHOT at 
544ef01681e1b1080fa6afbb17a24721e20b638b:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.892 ±  0.181  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.009 ±  0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    6.250 ±  2.679  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.006 ±  0.003  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   39.701 ±  2.739  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.004 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  314.899 ± 34.392  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.003 ±  0.001  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Current PR merged onto 2.1.2-SNAPSHOT at 
544ef01681e1b1080fa6afbb17a24721e20b638b:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.812 ±   0.150  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.008 ±   0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.501 ±   2.058  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.005 ±   0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   28.047 ±  12.950  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  176.200 ± 289.206  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±   0.003  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Slight change to move the IntPredicate outside the method to a 
private static final variable at the top of the file:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.710 ±   0.083  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.007 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.269 ±   1.178  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.004 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   25.945 ±   6.227  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  177.723 ± 427.999  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±   0.004  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Ran that same one again, because the error bars looked crazy to me 
(`±428` is crazy for a value of 178... there must've been some kind of outlier 
due to some other activity on the box):</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.706 ±  0.186  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.007 ±  0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.315 ±  1.793  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.004 ±  0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   26.623 ± 12.256  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  168.236 ± 47.627  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±  0.001  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Second slight change to use for loop with indexed pointer, keeping 
the predicate and calling `predicate.test(ch)`:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.713 ±  0.089  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.007 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.440 ±  1.746  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.004 ±  0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   27.450 ± 13.024  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  165.311 ± 68.361  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±  0.001  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Third slight change to use for loop with indexed pointer, and 
getting rid of the predicate in favor of inlining the checks into the `if` 
statement's condition:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.714 ±  0.019  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.007 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.318 ±  0.779  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.004 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   27.488 ±  9.656  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  166.778 ± 60.033  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±  0.001  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Fourth slight change to cover case where the string is empty (this 
case was initially omitted from this PR, even though the regex checked for 
it):</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.788 ±   0.658  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.008 ±   0.007  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    4.453 ±   2.814  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.004 ±   0.003  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   27.361 ±  12.971  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  168.248 ± 105.847  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±   0.001  ms/op
   ```
   
   </details>
   
   <details>
   <summary>Instead of doing the change in this PR, merely using a precompiled 
Pattern rather than calling `String.matches` results in:</summary>
   
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.875 ±  0.162  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.009 ±  0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    6.026 ±  0.598  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.006 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   38.700 ±  7.029  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.004 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  292.542 ± 44.951  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.003 ±  0.001  ms/op
   ```
   
   </details>
   
   So, these lower numbers look promising, but the outliers that widen the 
error bars make me skeptical of how consistent these are, as they seem to show 
some sensitivity to the environment, even though I wasn't actively using my 
machine for much else while they ran.
   
   My conclusions:
   
   1. the for loop does not matter over the codePoints IntStream, so the 
current IntStream is fine
   2. the predicate object does not matter over inlining the `if` condition 
into the method, so the current use of the predicate is fine
   3. moving the predicate out of the method doesn't hurt or help, but is 
better code structure
   4. adding coverage for the empty string case is needed
   5. the regex pattern is worse, even if pre-compiled, which only helped a 
little, and is not worth doing
   
   I can add a commit to this PR for 3 and 4, then merge this, because I think 
it will be done after that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to