dtspence commented on issue #3499: URL: https://github.com/apache/accumulo/issues/3499#issuecomment-1606337576
The results below use an updated tests branch implementing 1) multiple metadata-splits, 2) parameterized splits/files, and 3) multiple table splits. The tests show a few different implementations (including the original 2.1.1 baseline and version 1.10.3). I had been using #3501 for benchmarking during the week and included below. The profiler comparision on the updated 2.1.2-SNAPSHOT (latest) suggested there is some overhead with `Path()` and `validateFileName()` due to regex evaluation, which does not appear in #3501. The parameterized splits/files affected the per operation output and resulted in an overall execution time. The per-operation metric was relocated to the `rfile/op` (in the data below). The logic is currently on two separate branches (kept separate to avoid any impact to testing late last week): https://github.com/dtspence/accumulo-jmh-test/tree/gc-test-update https://github.com/dtspence/accumulo-jmh-test/tree/accumulo-1.10 --- **Accumulo 1.10:** The following details Accumulo 1.10.3 performance of `getReferenceIterator()` method. The test extracted the method located in SimpleGarbageCollector and executed in isolation. The test used multiple metadata tablets per t-server and included the other changes from the `gc-test-update` branch. Location of method extracted: https://github.com/apache/accumulo/blob/1.10/server/gc/src/main/java/org/apache/accumulo/gc/SimpleGarbageCollector.java#L323 ``` Benchmark (splitsRfile) Mode Cnt Score Error Units GarbageCollectorPerformanceIT.benchmarkReferences 1,100 avgt 3 0.471 ± 0.143 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1,100 avgt 3 0.005 ± 0.001 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 10,100 avgt 3 2.936 ± 0.482 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 10,100 avgt 3 0.003 ± 0.001 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 100,100 avgt 3 39.415 ± 11.400 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 100,100 avgt 3 0.004 ± 0.001 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 1000,100 avgt 3 357.505 ± 231.943 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1000,100 avgt 3 0.004 ± 0.002 ms/op ``` --- **Accumulo 2.1.1:** The following is a baseline test of 2.1.1. ``` Benchmark (splitsRfile) Mode Cnt Score Error Units GarbageCollectorPerformanceIT.benchmarkReferences 1,100 avgt 3 2.377 ± 0.726 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1,100 avgt 3 0.024 ± 0.007 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 10,100 avgt 3 16.781 ± 8.778 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 10,100 avgt 3 0.017 ± 0.009 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 100,100 avgt 3 137.649 ± 67.144 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 100,100 avgt 3 0.014 ± 0.007 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 1000,100 avgt 3 1207.793 ± 528.944 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1000,100 avgt 3 0.012 ± 0.005 ms/op ``` **PR 3501:** The logic in #3501 resulted in the following times. ``` Benchmark (splitsRfile) Mode Cnt Score Error Units GarbageCollectorPerformanceIT.benchmarkReferences 1,100 avgt 3 1.241 ± 0.559 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1,100 avgt 3 0.012 ± 0.006 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 10,100 avgt 3 5.304 ± 3.251 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 10,100 avgt 3 0.005 ± 0.003 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 100,100 avgt 3 30.570 ± 11.835 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 100,100 avgt 3 0.003 ± 0.001 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 1000,100 avgt 3 201.415 ± 92.688 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1000,100 avgt 3 0.002 ± 0.001 ms/op ``` **Accumulo 2.1.2-SNAPSHOT:** The most recent GC changes from 2.1.2-SNAPSHOT. The removal of the `getParent()` calls helped a lot to improve performance. The profiler showed a couple of calls 1) `Path.init()` and 2) `validateFileName()` which contained additional regex calls. ``` Benchmark (splitsRfile) Mode Cnt Score Error Units GarbageCollectorPerformanceIT.benchmarkReferences 1,100 avgt 3 1.665 ± 0.211 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1,100 avgt 3 0.017 ± 0.002 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 10,100 avgt 3 10.139 ± 16.299 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 10,100 avgt 3 0.010 ± 0.016 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 100,100 avgt 3 62.035 ± 24.129 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 100,100 avgt 3 0.006 ± 0.002 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 1000,100 avgt 3 521.994 ± 228.079 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1000,100 avgt 3 0.005 ± 0.002 ms/op ``` **Accumulo 2.1.2-SNAPSHOT (with Path(URI) and validateFileName()):** The following is based on the GC changes (from 2.1.2-SNAPSHOT) and makes a minor change to use the `Path(URI)` constructor (mentioned previously), avoiding a regex call. It also used the alternate `validateFileName()` described below. ``` Benchmark (splitsRfile) Mode Cnt Score Error Units GarbageCollectorPerformanceIT.benchmarkReferences 1,100 avgt 3 1.346 ± 0.402 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1,100 avgt 3 0.013 ± 0.004 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 10,100 avgt 3 6.318 ± 2.323 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 10,100 avgt 3 0.006 ± 0.002 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 100,100 avgt 3 36.814 ± 2.904 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 100,100 avgt 3 0.004 ± 0.001 ms/op GarbageCollectorPerformanceIT.benchmarkReferences 1000,100 avgt 3 244.694 ± 140.063 ms/op GarbageCollectorPerformanceIT.benchmarkReferences:rfile/op 1000,100 avgt 3 0.002 ± 0.001 ms/op ``` **Validate FileName:** Created an additional benchmark to experiment with a non-regex version of the check. I looked through PR comments and could not find if it was specifically discussed. ``` Baseline (w/regex): Benchmark Mode Cnt Score Error Units ValidationUtilPerformanceIT.benchmarkValidateFileName avgt 3 5.359 ± 0.572 ms/op Character comparison test: Benchmark Mode Cnt Score Error Units ValidationUtilPerformanceIT.benchmarkValidateFileName avgt 3 0.216 ± 0.047 ms/op ``` ```java public static void validateFileName(String fileName) { Objects.requireNonNull(fileName); for (int i = 0; i < fileName.length(); i++) { final char ch = fileName.charAt(i); if (!(Character.isLetterOrDigit(ch) || ch == '-' || ch == '.' || ch == '_')) { throw new IllegalArgumentException( "Provided filename (" + fileName + ") contains invalid characters."); } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
