[ 
https://issues.apache.org/jira/browse/LANG-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary D. Gregory resolved LANG-1802.
-----------------------------------
    Fix Version/s: 3.20.1
       Resolution: Fixed

Hello [~zhongxin]
Thank you for the report and pull request.
This PR does NOT include a unit test!
This is a good find, but the JRE already provides a solution. Please see git 
master.


> Fix collision in CharRange.hashCode()
> -------------------------------------
>
>                 Key: LANG-1802
>                 URL: https://issues.apache.org/jira/browse/LANG-1802
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.20.0
>            Reporter: Zhongxin Yan
>            Priority: Major
>             Fix For: 3.20.1
>
>
> *1. Background*
> The CharRange class (package: org.apache.commons.lang3) is a core component 
> of the CharSet utility, and its hashCode() method is frequently invoked in 
> scenarios such as HashMap/HashSet storage. The current implementation of 
> hashCode() (linear combination) has severe hash collision problems and 
> suboptimal calculation efficiency, which affects the performance of 
> upper-layer applications.
> {code:java}
> //Current hashCode implementation
> @Override
> public int hashCode() {
>     return 83 + start + 7 * end + (negated ? 1 : 0);
> }{code}
> h5. *2. Problem Description*
> The current {{hashCode()}} implementation has two critical issues:
>  * {*}High collision rate{*}: The linear combination ({{{}83 + start + 7 * 
> end + flag{}}}) easily causes hash value offset. For example:
>  ** {{CharRange.isNotIn((char) 1, (char) 2)}} (start=1, end=2, negated=true) 
> → hash = 83+1+14+1=99
>  ** {{CharRange.isIn((char) 2, (char) 2)}} (start=2, end=2, negated=false) → 
> hash = 83+2+14=99
>  These two logically distinct instances (equals() returns false) have the 
> same hash code, leading to serious bucket conflicts in {{{}HashMap{}}}.
>  * {*}Low calculation efficiency{*}: Relies on multiple arithmetic operations 
> (1 multiplication + 3 additions), which are slower than bitwise operations 
> and fail to utilize the 32-bit space of {{int}} efficiently.
> {code:java}
> //  hash collision cases
> @Test
> void testHashCodeCollision() {
>     // case A:hash=99
>     CharRange a1 = CharRange.isNotIn((char)1, (char)2); // 1,2,true → 
> 83+1+14+1=99
>     CharRange a2 = CharRange.isIn((char)2, (char)2);    // 2,2,false → 
> 83+2+14+0=99
>     assertEquals(a1.hashCode(), a2.hashCode()); // Collision
>     // case B:hash=123
>     CharRange b1 = CharRange.isIn((char)5, (char)5);    //5,5,false 
> →83+5+35+0=123
>     CharRange b2 = CharRange.isNotIn((char)4, (char)5); //4,5,true 
> →83+4+35+1=123
>     assertEquals(b1.hashCode(),b2.hashCode()); //Collision
> } {code}
> *3. Proposed Solution*
> Replace the current linear combination implementation with a bitwise splicing 
> + XOR flag scheme, which maximizes the use of 32-bit int space, minimizes 
> collision rate, and optimizes calculation efficiency:[Github 
> PR|https://github.com/apache/commons-lang/pull/1524] [~ggregory] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to