On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:

>> Re-write the IndexOf code without the use of the pcmpestri instruction, only 
>> using AVX2 instructions.  This change accelerates String.IndexOf on average 
>> 1.3x for AVX2.  The benchmark numbers:
>> 
>> 
>> Benchmark                                                   Score            
>> Latest          
>> StringIndexOf.advancedWithMediumSub   343.573                317.934         
>> 0.925375393x
>> StringIndexOf.advancedWithShortSub1    1039.081              1053.96         
>> 1.014319384x
>> StringIndexOf.advancedWithShortSub2        55.828            110.541         
>> 1.980027943x
>> StringIndexOf.constantPattern                        9.361           11.906  
>>         1.271872663x
>> StringIndexOf.searchCharLongSuccess          4.216           4.218           
>> 1.000474383x
>> StringIndexOf.searchCharMediumSuccess        3.133           3.216           
>> 1.02649218x
>> StringIndexOf.searchCharShortSuccess 3.76                    3.761           
>> 1.000265957x
>> StringIndexOf.success                                        9.186           
>> 9.713           1.057369911x
>> StringIndexOf.successBig                           14.341            46.343  
>>         3.231504079x
>> StringIndexOfChar.latin1_AVX2_String   6220.918              12154.52        
>>         1.953814533x
>> StringIndexOfChar.latin1_AVX2_char     5503.556              5540.044        
>>         1.006629895x
>> StringIndexOfChar.latin1_SSE4_String   6978.854              6818.689        
>>         0.977049957x
>> StringIndexOfChar.latin1_SSE4_char     5657.499              5474.624        
>>         0.967675646x
>> StringIndexOfChar.latin1_Short_String          7132.541              
>> 6863.359                0.962260014x
>> StringIndexOfChar.latin1_Short_char  16013.389             16162.437         
>> 1.009307711x
>> StringIndexOfChar.latin1_mixed_String          7386.123            14771.622 
>>         1.999915517x
>> StringIndexOfChar.latin1_mixed_char    9901.671              9782.245        
>>         0.987938803
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rearrange; add lambdas for clarity

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314:

> 312: 
> 313:     // needle_len is in elements, not bytes, for UTF-16
> 314:     __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : 
> OPT_NEEDLE_SIZE_MAX);

OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even 
number?

src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329:

> 327:     
> ////////////////////////////////////////////////////////////////////////////////////////
> 328: 
> 329:     __ bind(L_begin);

So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high 
level algorithm description here is needed in comments to follow the code 
below.  A description of what are the various paths in terms of haystack and 
needle sizes and how to reason the assembly code below and make sure that all 
the paths are taken care of. Also the abstraction level suddenly changes here 
to detailed code below instead of methods for the various paths.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551
PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095

Reply via email to