[jira] [Commented] (COLLECTIONS-855) Update the EnhancedDoubleHasher to correct the cube component of the hash

Alex Herbert (Jira) Wed, 29 May 2024 05:35:06 -0700


    [ 
https://issues.apache.org/jira/browse/COLLECTIONS-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850349#comment-17850349
 ]


Alex Herbert commented on COLLECTIONS-855:
------------------------------------------

It is not a breaking API change. It is a functional change. This may break code 
that relies on the output sequence. But this should be treated as a black box. 
Users should not be relying on the output, just like they should not rely on 
the hash code of an object being a specific value.

I already made the change locally. It only breaks the EnhancedDoubleHasher 
tests due to their hard coded expected sequence.

IIUC there are two ways to do it. The change with the extra test outside the 
loop will probably not have coverage:

Change the loop to:
{code:java}
                    // Old:
                    // for (int i = 0; i < k; i++) {
                    for (int i = 1; i <= k; i++) {
                        if (!consumer.test(index)) {
                            return false;
                        }
                        // Update index and handle wrapping
                        index -= inc;
                        index = index < 0 ? index + bits : index;

                        // Incorporate the counter into the increment to create 
a
                        // tetrahedral number additional term, and handle 
wrapping.
                        inc -= i;
                        inc = inc < 0 ? inc + bits : inc;
                    }
{code}

Change the loop to only compute an update if it is to be consumed:

{code:java}
                    if (!consumer.test(index)) {
                        return false;
                    }
                    for (int i = 1; i < k; i++) {
                        // Update index and handle wrapping
                        index -= inc;
                        index = index < 0 ? index + bits : index;

                        // Incorporate the counter into the increment to create 
a
                        // tetrahedral number additional term, and handle 
wrapping.
                        inc -= i;
                        inc = inc < 0 ? inc + bits : inc;

                        if (!consumer.test(index)) {
                            return false;
                        }
                    }
{code}


> Update the EnhancedDoubleHasher to correct the cube component of the hash
> -------------------------------------------------------------------------
>
>                 Key: COLLECTIONS-855
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-855
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Bloomfilter
>    Affects Versions: 4.5.0-M1
>            Reporter: Alex Herbert
>            Priority: Blocker
>
> The EnhancedDoubleHasher currently computes the hash with the cube component 
> lagging by 1:
> {noformat}
> hash[i] = ( h1(x) - i*h2(x) - ((i-1)^3 - (i-1))/6 ) wrapped in [0, 
> bits){noformat}
> Correct this to the intended:
> {noformat}
> hash[i] = ( h1(x) - i*h2(x) - (i*i*i - i)/6 ) wrapped in [0, bits){noformat}
> This is a simple change in the current controlling loop from:
> {code:java}
> for (int i = 0; i < k; i++) { {code}
> to:
> {code:java}
> for (int i = 1; i <= k; i++) { {code}
>  
> Issue notified by Juan Manuel Gimeno Illa on the Commons dev mailing list 
> (see [https://lists.apache.org/thread/wjmwxzozrtf41ko9r0g7pzrrg11o923o]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (COLLECTIONS-855) Update the EnhancedDoubleHasher to correct the cube component of the hash

Reply via email to