dtspence commented on issue #3499:
URL: https://github.com/apache/accumulo/issues/3499#issuecomment-1606337576

   The results below use an updated tests branch implementing 1) multiple 
metadata-splits, 2) parameterized splits/files, and 3) multiple table splits. 
The tests show a few different implementations (including the original 2.1.1 
baseline and version 1.10.3). I had been using #3501 for benchmarking during 
the week and included below. The profiler comparision on the updated 
2.1.2-SNAPSHOT (latest) suggested there is some overhead with `Path()` and 
`validateFileName()` due to regex evaluation, which does not appear in #3501.
   
   The parameterized splits/files affected the per operation output and 
resulted in an overall execution time. The per-operation metric was relocated 
to the `rfile/op` (in the data below). 
   
   The logic is currently on two separate branches (kept separate to avoid any 
impact to testing late last week):
   https://github.com/dtspence/accumulo-jmh-test/tree/gc-test-update
   https://github.com/dtspence/accumulo-jmh-test/tree/accumulo-1.10
   
   ---
   **Accumulo 1.10:** The following details Accumulo 1.10.3 performance of 
`getReferenceIterator()` method. The test extracted the method located in 
SimpleGarbageCollector and executed in isolation. The test used multiple 
metadata tablets per t-server and included the other changes from the 
`gc-test-update` branch.
   
   Location of method extracted:
   
https://github.com/apache/accumulo/blob/1.10/server/gc/src/main/java/org/apache/accumulo/gc/SimpleGarbageCollector.java#L323
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    0.471 ±   0.143  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.005 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    2.936 ±   0.482  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.003 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   39.415 ±  11.400  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.004 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  357.505 ± 231.943  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.004 ±   0.002  ms/op
   ```
   ---
   
   **Accumulo 2.1.1:** The following is a baseline test of 2.1.1.
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt     Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3     2.377 ±   0.726  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3     0.024 ±   0.007  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    16.781 ±   8.778  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3     0.017 ±   0.009  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   137.649 ±  67.144  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3     0.014 ±   0.007  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  1207.793 ± 528.944  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3     0.012 ±   0.005  ms/op
   ```
   
   **PR 3501:** The logic in #3501 resulted in the following times.
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score    Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    1.241 ±  0.559  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.012 ±  0.006  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    5.304 ±  3.251  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.005 ±  0.003  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   30.570 ± 11.835  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.003 ±  0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  201.415 ± 92.688  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±  0.001  ms/op
   
   ```
   
   **Accumulo 2.1.2-SNAPSHOT:** The most recent GC changes from 2.1.2-SNAPSHOT. 
The removal of the `getParent()` calls helped a lot to improve performance. The 
profiler showed a couple of calls 1) `Path.init()` and 2) `validateFileName()` 
which contained additional regex calls. 
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    1.665 ±   0.211  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.017 ±   0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3   10.139 ±  16.299  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.010 ±   0.016  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   62.035 ±  24.129  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.006 ±   0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  521.994 ± 228.079  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.005 ±   0.002  ms/op
   ```
   
   **Accumulo 2.1.2-SNAPSHOT (with Path(URI) and validateFileName()):** The 
following is based on the GC changes (from 2.1.2-SNAPSHOT) and makes a minor 
change to use the `Path(URI)` constructor (mentioned previously), avoiding a 
regex call. It also used the alternate `validateFileName()` described below.
   ```
   Benchmark                                                   (splitsRfile)  
Mode  Cnt    Score     Error  Units
   GarbageCollectorPerformanceIT.benchmarkReferences                   1,100  
avgt    3    1.346 ±   0.402  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op          1,100  
avgt    3    0.013 ±   0.004  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                  10,100  
avgt    3    6.318 ±   2.323  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op         10,100  
avgt    3    0.006 ±   0.002  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                 100,100  
avgt    3   36.814 ±   2.904  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op        100,100  
avgt    3    0.004 ±   0.001  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences                1000,100  
avgt    3  244.694 ± 140.063  ms/op
   GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op       1000,100  
avgt    3    0.002 ±   0.001  ms/op
   ```
   
   **Validate FileName:** Created an additional benchmark to experiment with a 
non-regex version of the check.  I looked through PR comments and could not 
find if it was specifically discussed.
   
   ```
   Baseline (w/regex):
   Benchmark                                              Mode  Cnt  Score   
Error  Units
   ValidationUtilPerformanceIT.benchmarkValidateFileName  avgt    3  5.359 ± 
0.572  ms/op
   
   Character comparison test:
   Benchmark                                              Mode  Cnt  Score   
Error  Units
   ValidationUtilPerformanceIT.benchmarkValidateFileName  avgt    3  0.216 ± 
0.047  ms/op
   ```
   ```java 
       public static void validateFileName(String fileName) {
           Objects.requireNonNull(fileName);
           for (int i = 0; i < fileName.length(); i++) {
               final char ch = fileName.charAt(i);
               if (!(Character.isLetterOrDigit(ch) || ch == '-' || ch == '.' || 
ch == '_')) {
                   throw new IllegalArgumentException(
                           "Provided filename (" + fileName + ") contains 
invalid characters.");
               }
           }
       }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to