neoremind opened a new pull request, #16280:
URL: https://github.com/apache/lucene/pull/16280

   
   ## Background
   
   In #13863, `ByteBuffersDataOutput.writeString()` was optimized to avoid 
allocating `BytesRef` and copying bytes to the dest buffer, instead it encoded 
directly in place. Indeed, it requires two passes over the input string chars: 
first `calcUTF16toUTF8Length` to get the VInt length prefix, then `UTF16toUTF8` 
for the utf8 encoding. The opportunity is: for short strings, we can save that 
first pass.
   
   ## What this PR does
   
   This PR adds a single-pass fast path for short strings (charCount <= 42) 
where the max UTF-8 byte length is `42 * 3 = 126`, it always fits as 1-byte 
VInt. So we know the VInt prefix size without needing to go over the string 
chars upfront. Reserve 1 byte, encode directly into the dest buffer, then 
backfill the length. For strings that don't hit the shortcut, fall to existing 
logic.
   
   To my understanding, this could benefit stored fields writes of short 
strings like business related keywords, IDs, titles, etc. Plus short strings 
like field infos, codec metadata, segment names, etc.
   
   ## Benchmarks
   
   I added a JMH benchmark comparing the new impl against the current across 
ASCII, CJK, and Latin-extended strings at various lengths, see 
[here](https://github.com/apache/lucene/compare/main...neoremind:lucene:bbo_writestring_fast_path_bench?expand=1#diff-d6793a43f462bc34205113c143695761c1fbe50e7197494de9bc4686569fc8c6R451
   ) for keeping the current impl to do apple-to-apple compare. Target written 
byte size matches stored fields chunk sizes: 80KB (BEST_SPEED default), 480KB 
(BEST_COMPRESSION default), and 2MB (imagine customized larger chunk in store 
fields .fdt). The benchmark uses a resettable `ByteBuffersDataOutput` starting 
with 1KB blocks to mimic real-world workload.
   
   Results show notable gains on short strings with no regressions on 
medium/long/very large strings (only acceptable jitter as I saw) which fall to 
the unchanged logic.
   
   Throughput in ops/s. Each run writes target written byte size into the 
buffer. Measured on EC2 m5.2xlarge.
   
   <details>
   <summary> See detailed results </summary>
   
   ```
   
   Benchmark                                               (stringType)  
(targetBytes)   Mode  Cnt      Score     Error  Units
   ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1         
 81920  thrpt   15   1924.154 ±   3.998  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1         
491520  thrpt   15    325.054 ±   0.712  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl            ascii_1        
2097152  thrpt   15     77.335 ±   0.249  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10         
 81920  thrpt   15   5127.397 ± 124.657  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10         
491520  thrpt   15    894.737 ±   4.701  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_10        
2097152  thrpt   15    206.414 ±   2.523  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20         
 81920  thrpt   15   7907.056 ±  28.022  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20         
491520  thrpt   15   1374.817 ±   4.420  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_20        
2097152  thrpt   15    325.101 ±   0.932  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30         
 81920  thrpt   15   9654.601 ±  40.498  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30         
491520  thrpt   15   1764.192 ±   6.306  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_30        
2097152  thrpt   15    416.434 ±   1.790  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40         
 81920  thrpt   15  10563.802 ±  30.043  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40         
491520  thrpt   15   1891.552 ±   4.140  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           ascii_40        
2097152  thrpt   15    449.588 ±   4.443  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium         
 81920  thrpt   15   9263.776 ±  98.204  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium         
491520  thrpt   15   1514.433 ±   0.863  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_medium        
2097152  thrpt   15    356.831 ±   0.588  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long         
 81920  thrpt   15  12117.442 ± 424.084  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long         
491520  thrpt   15   2114.019 ±   2.865  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         ascii_long        
2097152  thrpt   15    503.861 ±   5.616  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge         
 81920  thrpt   15  11603.539 ±  28.604  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge         
491520  thrpt   15   2050.525 ±   1.159  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       ascii_vlarge        
2097152  thrpt   15    519.435 ±   5.892  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1         
 81920  thrpt   15   3598.613 ±  27.463  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1         
491520  thrpt   15    589.760 ±   2.930  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              cjk_1        
2097152  thrpt   15    142.267 ±   1.822  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10         
 81920  thrpt   15   6516.930 ± 155.093  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10         
491520  thrpt   15   1124.501 ±  51.999  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_10        
2097152  thrpt   15    268.392 ±  10.699  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20         
 81920  thrpt   15   7444.068 ±  28.467  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20         
491520  thrpt   15   1251.821 ±  63.880  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_20        
2097152  thrpt   15    316.346 ±   4.879  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30         
 81920  thrpt   15   7735.062 ±  33.040  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30         
491520  thrpt   15   1369.589 ±  23.248  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_30        
2097152  thrpt   15    310.114 ±  12.392  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40         
 81920  thrpt   15   7861.299 ±  44.006  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40         
491520  thrpt   15   1426.798 ±   1.373  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl             cjk_40        
2097152  thrpt   15    328.560 ±   8.392  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium         
 81920  thrpt   15   5302.579 ±  67.898  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium         
491520  thrpt   15    829.204 ±   5.262  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_medium        
2097152  thrpt   15    210.442 ±   0.308  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long         
 81920  thrpt   15   5704.934 ± 119.140  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long         
491520  thrpt   15    934.739 ±  31.456  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl           cjk_long        
2097152  thrpt   15    211.968 ±   3.531  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge         
 81920  thrpt   15   6736.329 ± 244.534  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge         
491520  thrpt   15    927.611 ±  12.725  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl         cjk_vlarge        
2097152  thrpt   15    231.230 ±   4.009  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1         
 81920  thrpt   15   2330.881 ±  32.202  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1         
491520  thrpt   15    398.409 ±   5.090  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl        latin_ext_1        
2097152  thrpt   15     93.175 ±   1.428  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10         
 81920  thrpt   15   4296.039 ±  48.292  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10         
491520  thrpt   15    748.831 ±   5.288  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_10        
2097152  thrpt   15    178.731 ±   2.817  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20         
 81920  thrpt   15   4953.465 ±  80.963  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20         
491520  thrpt   15    859.932 ±  27.221  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_20        
2097152  thrpt   15    206.179 ±   6.109  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30         
 81920  thrpt   15   5053.684 ± 232.941  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30         
491520  thrpt   15    878.187 ±  10.097  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_30        
2097152  thrpt   15    208.340 ±   1.234  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40         
 81920  thrpt   15   4932.669 ±   9.067  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40         
491520  thrpt   15    962.194 ±  57.633  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl       latin_ext_40        
2097152  thrpt   15    216.052 ±   2.011  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium         
 81920  thrpt   15   3523.366 ±  14.522  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium         
491520  thrpt   15    593.160 ±   3.174  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_medium        
2097152  thrpt   15    138.684 ±   0.154  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long         
 81920  thrpt   15   3652.496 ±  86.858  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long         
491520  thrpt   15    630.856 ±  23.506  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl     latin_ext_long        
2097152  thrpt   15    152.758 ±   5.463  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge         
 81920  thrpt   15   4227.879 ±   7.569  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge         
491520  thrpt   15    633.812 ±   1.601  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl   latin_ext_vlarge        
2097152  thrpt   15    148.096 ±   0.526  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed         
 81920  thrpt   15   2610.423 ±   8.035  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed         
491520  thrpt   15    526.189 ±  11.442  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.newImpl              mixed        
2097152  thrpt   15    117.501 ±   5.147  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1         
 81920  thrpt   15   1449.904 ±   0.730  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1         
491520  thrpt   15    237.547 ±   0.981  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl           ascii_1        
2097152  thrpt   15     55.849 ±   0.035  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10         
 81920  thrpt   15   3632.715 ±   7.330  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10         
491520  thrpt   15    608.009 ±   1.032  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_10        
2097152  thrpt   15    143.089 ±   0.086  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20         
 81920  thrpt   15   5513.255 ±  16.047  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20         
491520  thrpt   15    939.471 ±   0.893  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_20        
2097152  thrpt   15    221.746 ±   0.437  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30         
 81920  thrpt   15   6810.637 ±  33.651  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30         
491520  thrpt   15   1180.119 ±   2.552  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_30        
2097152  thrpt   15    276.847 ±   0.688  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40         
 81920  thrpt   15   7800.776 ±  14.315  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40         
491520  thrpt   15   1310.465 ±   2.490  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          ascii_40        
2097152  thrpt   15    311.610 ±   0.348  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium         
 81920  thrpt   15   9042.239 ±  37.124  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium         
491520  thrpt   15   1470.004 ±   5.105  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_medium        
2097152  thrpt   15    346.409 ±   0.763  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long         
 81920  thrpt   15  10884.157 ±  32.714  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long         
491520  thrpt   15   2047.124 ±   3.786  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        ascii_long        
2097152  thrpt   15    485.906 ±   0.356  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge         
 81920  thrpt   15  11570.370 ±  10.070  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge         
491520  thrpt   15   2070.484 ±   1.673  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      ascii_vlarge        
2097152  thrpt   15    506.705 ±  11.358  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1         
 81920  thrpt   15   2732.453 ±  18.110  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1         
491520  thrpt   15    473.930 ±  11.438  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             cjk_1        
2097152  thrpt   15    109.360 ±   2.644  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10         
 81920  thrpt   15   4078.860 ± 229.551  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10         
491520  thrpt   15    729.199 ±  42.046  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_10        
2097152  thrpt   15    163.849 ±   0.211  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20         
 81920  thrpt   15   4728.439 ± 108.248  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20         
491520  thrpt   15    756.027 ±  28.522  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_20        
2097152  thrpt   15    180.958 ±  11.565  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30         
 81920  thrpt   15   4945.852 ± 123.435  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30         
491520  thrpt   15    853.268 ±   4.967  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_30        
2097152  thrpt   15    199.801 ±   0.083  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40         
 81920  thrpt   15   5080.684 ± 114.575  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40         
491520  thrpt   15    872.155 ±   0.935  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl            cjk_40        
2097152  thrpt   15    198.099 ±   5.012  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium         
 81920  thrpt   15   5114.304 ±  16.729  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium         
491520  thrpt   15    836.790 ±   3.880  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_medium        
2097152  thrpt   15    193.791 ±  14.359  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long         
 81920  thrpt   15   5636.091 ±  96.048  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long         
491520  thrpt   15    899.898 ±   4.430  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl          cjk_long        
2097152  thrpt   15    211.120 ±   0.845  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge         
 81920  thrpt   15   6610.988 ± 368.882  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge         
491520  thrpt   15    897.061 ±  15.893  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl        cjk_vlarge        
2097152  thrpt   15    226.848 ±   9.797  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1         
 81920  thrpt   15   1707.395 ±  20.488  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1         
491520  thrpt   15    290.791 ±   0.661  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl       latin_ext_1        
2097152  thrpt   15     68.084 ±   0.438  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10         
 81920  thrpt   15   2562.599 ±  27.365  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10         
491520  thrpt   15    437.844 ±   3.480  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_10        
2097152  thrpt   15    103.573 ±   0.355  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20         
 81920  thrpt   15   2849.567 ±   5.463  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20         
491520  thrpt   15    488.922 ±   4.148  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_20        
2097152  thrpt   15    114.500 ±   0.159  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30         
 81920  thrpt   15   3112.005 ± 104.903  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30         
491520  thrpt   15    519.170 ±   1.386  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_30        
2097152  thrpt   15    125.173 ±   4.172  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40         
 81920  thrpt   15   3159.485 ±  13.467  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40         
491520  thrpt   15    545.461 ±  10.699  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl      latin_ext_40        
2097152  thrpt   15    129.708 ±   4.595  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium         
 81920  thrpt   15   3521.568 ±   4.052  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium         
491520  thrpt   15    604.327 ±  17.521  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_medium        
2097152  thrpt   15    138.913 ±   0.268  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long         
 81920  thrpt   15   3583.787 ±  28.151  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long         
491520  thrpt   15    619.880 ±   9.109  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl    latin_ext_long        
2097152  thrpt   15    156.162 ±   0.251  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge         
 81920  thrpt   15   4230.539 ±  11.689  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge         
491520  thrpt   15    636.914 ±   1.179  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl  latin_ext_vlarge        
2097152  thrpt   15    147.291 ±   0.189  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed         
 81920  thrpt   15   2569.503 ±  34.528  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed         
491520  thrpt   15    471.877 ±  13.853  ops/s
   ByteBuffersDataOutputWriteStringBenchmark.prevImpl             mixed        
2097152  thrpt   15    111.679 ±   0.714  ops/s
   
   ```
   
   </details>
   
   ### 80KB target (BEST_SPEED chunk size)
   
   | String Type | New | Prev | Delta |
   |---|---|---|---|
   | ascii_1 | 1924 | 1478 | **+30%** |
   | ascii_10 | 5127 | 3633 | **+41%** |
   | ascii_20 | 7907 | 5513 | **+43%** |
   | ascii_30 | 9655 | 6811 | **+42%** |
   | ascii_40 | 10564 | 7801 | **+35%** |
   | ascii_medium | 9264 | 9042 | +2% |
   | ascii_long | 12117 | 10884 | +11% |
   | ascii_vlarge | 11604 | 11570 | 0% |
   | cjk_1 | 3599 | 2732 | **+32%** |
   | cjk_10 | 6517 | 4079 | **+60%** |
   | cjk_20 | 7444 | 4728 | **+57%** |
   | cjk_30 | 7735 | 4946 | **+56%** |
   | cjk_40 | 7861 | 5081 | **+55%** |
   | cjk_medium | 5303 | 5114 | +4% |
   | cjk_long | 5705 | 5636 | +1% |
   | cjk_vlarge | 6736 | 6611 | +2% |
   | latin_ext_1 | 2331 | 1707 | **+37%** |
   | latin_ext_10 | 4296 | 2563 | **+68%** |
   | latin_ext_20 | 4953 | 2850 | **+74%** |
   | latin_ext_30 | 5054 | 3112 | **+62%** |
   | latin_ext_40 | 4933 | 3159 | **+56%** |
   | latin_ext_medium | 3523 | 3522 | 0% |
   | latin_ext_long | 3652 | 3584 | +2% |
   | latin_ext_vlarge | 4228 | 4231 | 0% |
   | mixed | 2610 | 2570 | +2% |
   
   ### 480KB target (BEST_COMPRESSION chunk size)
   
   | String Type | New | Prev | Delta |
   |---|---|---|---|
   | ascii_1 | 325 | 238 | **+37%** |
   | ascii_10 | 895 | 608 | **+47%** |
   | ascii_20 | 1375 | 939 | **+46%** |
   | ascii_30 | 1764 | 1180 | **+49%** |
   | ascii_40 | 1892 | 1310 | **+44%** |
   | ascii_medium | 1514 | 1470 | +3% |
   | ascii_long | 2114 | 2047 | +3% |
   | ascii_vlarge | 2051 | 2070 | −1% |
   | cjk_1 | 590 | 474 | **+24%** |
   | cjk_10 | 1125 | 729 | **+54%** |
   | cjk_20 | 1252 | 756 | **+66%** |
   | cjk_30 | 1370 | 853 | **+61%** |
   | cjk_40 | 1427 | 872 | **+64%** |
   | cjk_medium | 829 | 837 | −1% |
   | cjk_long | 935 | 900 | +4% |
   | cjk_vlarge | 928 | 897 | +3% |
   | latin_ext_1 | 398 | 291 | **+37%** |
   | latin_ext_10 | 749 | 438 | **+71%** |
   | latin_ext_20 | 860 | 489 | **+76%** |
   | latin_ext_30 | 878 | 519 | **+69%** |
   | latin_ext_40 | 962 | 545 | **+76%** |
   | latin_ext_medium | 593 | 604 | −2% |
   | latin_ext_long | 631 | 620 | +2% |
   | latin_ext_vlarge | 634 | 637 | 0% |
   | mixed | 526 | 472 | **+12%** |
   
   ### 2MB target (larger workload)
   
   | String Type | New | Prev | Delta |
   |---|---|---|---|
   | ascii_1 | 77 | 56 | **+38%** |
   | ascii_10 | 206 | 143 | **+44%** |
   | ascii_20 | 325 | 222 | **+47%** |
   | ascii_30 | 416 | 277 | **+50%** |
   | ascii_40 | 450 | 312 | **+44%** |
   | ascii_medium | 357 | 346 | +3% |
   | ascii_long | 504 | 486 | +4% |
   | ascii_vlarge | 519 | 507 | +3% |
   | cjk_1 | 142 | 109 | **+30%** |
   | cjk_10 | 268 | 164 | **+64%** |
   | cjk_20 | 316 | 181 | **+75%** |
   | cjk_30 | 310 | 200 | **+55%** |
   | cjk_40 | 329 | 198 | **+66%** |
   | cjk_medium | 210 | 194 | +9% |
   | cjk_long | 212 | 211 | 0% |
   | cjk_vlarge | 231 | 227 | +2% |
   | latin_ext_1 | 93 | 68 | **+37%** |
   | latin_ext_10 | 179 | 104 | **+73%** |
   | latin_ext_20 | 206 | 115 | **+80%** |
   | latin_ext_30 | 208 | 125 | **+66%** |
   | latin_ext_40 | 216 | 130 | **+67%** |
   | latin_ext_medium | 139 | 139 | 0% |
   | latin_ext_long | 153 | 156 | −2% |
   | latin_ext_vlarge | 148 | 147 | +1% |
   | mixed | 118 | 112 | **+5%** |
   
   ## More thoughts
   
   I initially attempted a more aggressive approach: adding a second fast path 
also for 2-byte VInt (charCount 128–5461) and a `calcVIntSizeForUTF8Length` 
utility method with early-exit scanning for ambiguous ranges. This showed 
strong wins for almost all setups but for configurations with larger block 
sizes or larger target written size (enlarged docs per chunk or chunk size). 
But for the default settings (80KB chunk / 1024 docs), there is one ~5% 
regression on `ascii_medium`, plus it introduced extra branches, more complex 
logic. So I kept it simple: only the 1-byte VInt fast path. The code is 
straightforward, easy to read, and no regressions for all cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to