[ 
https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552727#comment-17552727
 ] 

Uwe Schindler commented on LUCENE-10610:
----------------------------------------

I checked the code. If you look at equals you see the automaton is never party 
of the game. it is just referred to in the constructor and only used for some 
calculations in derived objects.

The equals of RunAutomaton only comapres the local arrays. Therefor the 
hashCode must also do this. As creating hashCodes of array is expensive, those 
should be cached.

Actually it is quite easy: either create the integer hashCode in ctor and just 
return it, or do lazy init. The arrays and bitsets in equals don't change 
anymore.

> RunAutomaton#hashCode() can easily cause hash collision for different 
> Automatons
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-10610
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10610
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Tomoko Uchida
>            Priority: Minor
>
> Current RunAutomaton#hashCode() is:
> {code:java}
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + alphabetSize;
>     result = prime * result + points.length;
>     result = prime * result + size;
>     return result;
>   }
> {code}
> Since it does not take account of the contents of the {{points}} array, this 
> returns the same value for different automatons when their alphabet size and 
> state size are the same.
> For example, this test code passes.
> {code:java}
>   public void testHashCode() throws IOException {
>     PrefixQuery q1 = new PrefixQuery(new Term("field", "aba"));
>     PrefixQuery q2 = new PrefixQuery(new Term("field", "fee"));
>     assert q1.compiled.runAutomaton.hashCode() == 
> q2.compiled.runAutomaton.hashCode();
>   }
> {code}
> I suspect this is a bug?
> Note that I think it's not a serious one; all callers of this {{hashCode()}} 
> take account of additional information when calculating their own hash value, 
> it seems there is no substantial impact on higher-level APIs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to