[jira] [Commented] (COLLECTIONS-803) CaseInsensitiveMap prevent duplicate key conversion on put

Alex Herbert (Jira) Thu, 30 Mar 2023 05:38:06 -0700


    [ 
https://issues.apache.org/jira/browse/COLLECTIONS-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706861#comment-17706861
 ]


Alex Herbert commented on COLLECTIONS-803:
------------------------------------------

To check what we are expecting for the performance change I create a JMH 
benchmark that compares adding objects to the current implementation vs:
 * CaseInsensitiveMapCache: The cache implementation shown above (only 
overrides convertKey).
 * CaseInsensitiveMapSingleConvert: An update of the map to override put and 
reuse the converted key when adding a new entry.
 * CaseInsensitiveMapToLower: An update to use String.toLowerCase(Locale.ROOT). 
This conversion is performed twice.
 * CaseInsensitiveMapSingleConvertToLower: An update of the map to override put 
and reuse the converted key when adding a new entry. The conversion uses 
String.toLowerCase(Locale.ROOT).

I added random long values and random Strings of length 50 created from the 
Base64 alphabet.
||mapType||objectType||size||Method||Score||Error||Speed increase||
|CaseInsensitiveMap|Long|10000|put|2159865.547|33780.31237| |
|CaseInsensitiveMapCache|Long|10000|put|1663796.928|54252.36748|1.298|
|CaseInsensitiveMapSingleConvert|Long|10000|put|1361187.795|56001.07637|1.587|
|CaseInsensitiveMapToLower|Long|10000|put|1443153.695|103557.629|1.497|
|CaseInsensitiveMapSingleConvertToLower|Long|10000|put|1110610.616|34136.88536|1.945|
|CaseInsensitiveMap|Base64|10000|put|1.46E+07|267112.2832| |
|CaseInsensitiveMapCache|Base64|10000|put|8068577.213|378623.5886|1.812|
|CaseInsensitiveMapSingleConvert|Base64|10000|put|7646249.161|188859.4017|1.912|
|CaseInsensitiveMapToLower|Base64|10000|put|9805182.416|236714.5242|1.491|
|CaseInsensitiveMapSingleConvertToLower|Base64|10000|put|5549242.736|196453.5884|2.635|

This is a best case for performance changes as objects are very unlikely to be 
duplicates and all objects are added to the map.

Observations:
 * Avoiding a second conversion of the key is roughly 1.5x faster for Long keys 
which require no character conversion and use a limited alphabet.
 * Avoiding a second conversion of the key is roughly 2x faster for Strings 
keys which use a larger ASCII alphabet.
 * Switching key conversion to use String.toLowerCase is roughly 50% faster. 
This method performance is stable between the long and String keys.
 * Avoiding a second conversion and using String.toLowerCase can be 2 - 2.5x 
faster.

So key conversion is the main overhead in the map put function.

> CaseInsensitiveMap prevent duplicate key conversion on put
> ----------------------------------------------------------
>
>                 Key: COLLECTIONS-803
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-803
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Map
>    Affects Versions: 4.4
>            Reporter: Simulant
>            Priority: Minor
>              Labels: performance
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When adding a new item into a {{CaseInsensitiveMap}} the {{convertKey(key)}} 
> method is called twice, once in the {{put(key, value)}} method and second in 
> the {{createEntry(next, hashCode, key, value)}} method. The result could be 
> re-used resulting in a better performance.
> Depending on the {{toString()}} implementation of the key and the resulting 
> length of the key before the lower case conversion the operation can get 
> expensive and should not be called twice, as the {{CaseInsensitiveMap}} 
> overwrites the {{convertKey(key)}} method and makes it more expensive and 
> depending on the input unlike in the implementation of the 
> {{AbstractHashedMap}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (COLLECTIONS-803) CaseInsensitiveMap prevent duplicate key conversion on put

Reply via email to