[
https://issues.apache.org/jira/browse/COLLECTIONS-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706861#comment-17706861
]
Alex Herbert commented on COLLECTIONS-803:
------------------------------------------
To check what we are expecting for the performance change I create a JMH
benchmark that compares adding objects to the current implementation vs:
* CaseInsensitiveMapCache: The cache implementation shown above (only
overrides convertKey).
* CaseInsensitiveMapSingleConvert: An update of the map to override put and
reuse the converted key when adding a new entry.
* CaseInsensitiveMapToLower: An update to use String.toLowerCase(Locale.ROOT).
This conversion is performed twice.
* CaseInsensitiveMapSingleConvertToLower: An update of the map to override put
and reuse the converted key when adding a new entry. The conversion uses
String.toLowerCase(Locale.ROOT).
I added random long values and random Strings of length 50 created from the
Base64 alphabet.
||mapType||objectType||size||Method||Score||Error||Speed increase||
|CaseInsensitiveMap|Long|10000|put|2159865.547|33780.31237| |
|CaseInsensitiveMapCache|Long|10000|put|1663796.928|54252.36748|1.298|
|CaseInsensitiveMapSingleConvert|Long|10000|put|1361187.795|56001.07637|1.587|
|CaseInsensitiveMapToLower|Long|10000|put|1443153.695|103557.629|1.497|
|CaseInsensitiveMapSingleConvertToLower|Long|10000|put|1110610.616|34136.88536|1.945|
|CaseInsensitiveMap|Base64|10000|put|1.46E+07|267112.2832| |
|CaseInsensitiveMapCache|Base64|10000|put|8068577.213|378623.5886|1.812|
|CaseInsensitiveMapSingleConvert|Base64|10000|put|7646249.161|188859.4017|1.912|
|CaseInsensitiveMapToLower|Base64|10000|put|9805182.416|236714.5242|1.491|
|CaseInsensitiveMapSingleConvertToLower|Base64|10000|put|5549242.736|196453.5884|2.635|
This is a best case for performance changes as objects are very unlikely to be
duplicates and all objects are added to the map.
Observations:
* Avoiding a second conversion of the key is roughly 1.5x faster for Long keys
which require no character conversion and use a limited alphabet.
* Avoiding a second conversion of the key is roughly 2x faster for Strings
keys which use a larger ASCII alphabet.
* Switching key conversion to use String.toLowerCase is roughly 50% faster.
This method performance is stable between the long and String keys.
* Avoiding a second conversion and using String.toLowerCase can be 2 - 2.5x
faster.
So key conversion is the main overhead in the map put function.
> CaseInsensitiveMap prevent duplicate key conversion on put
> ----------------------------------------------------------
>
> Key: COLLECTIONS-803
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-803
> Project: Commons Collections
> Issue Type: Improvement
> Components: Map
> Affects Versions: 4.4
> Reporter: Simulant
> Priority: Minor
> Labels: performance
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> When adding a new item into a {{CaseInsensitiveMap}} the {{convertKey(key)}}
> method is called twice, once in the {{put(key, value)}} method and second in
> the {{createEntry(next, hashCode, key, value)}} method. The result could be
> re-used resulting in a better performance.
> Depending on the {{toString()}} implementation of the key and the resulting
> length of the key before the lower case conversion the operation can get
> expensive and should not be called twice, as the {{CaseInsensitiveMap}}
> overwrites the {{convertKey(key)}} method and makes it more expensive and
> depending on the input unlike in the implementation of the
> {{AbstractHashedMap}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)