[ https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552909#comment-17552909 ]
Tomoko Uchida commented on LUCENE-10610: ---------------------------------------- I may completely miss the point so correct me if I'm wrong - but possibly does it make sense to make all methods that change the internal state of {{Automaton}}, and make it immutable (from the perspective of the outside package)? It looks {{Automaton.Builder}} has all the necessary methods to build an Automaton instance, so I wonder if it is sufficient to only expose the builder class. I feel a bit awkward that the built Automaton instance can be still modified after "finishing" by the builder. If we make it immutable, we could safely set a (pre-computed or on-the-fly) hash value to it, and in return for that, we can freely remove {{hashCode()}} and {{equals()}} from {{CompiledAutomaton}} and {{RunAutomaton}} classes. Rather than having them in the classes to run actual matching operations, I guess it could be more natural to have it in {{Automaton}} like {{Query}} class - the prototype and most higher interface? > RunAutomaton#hashCode() can easily cause hash collision for different > Automatons > -------------------------------------------------------------------------------- > > Key: LUCENE-10610 > URL: https://issues.apache.org/jira/browse/LUCENE-10610 > Project: Lucene - Core > Issue Type: Bug > Reporter: Tomoko Uchida > Priority: Minor > > Current RunAutomaton#hashCode() is: > {code:java} > @Override > public int hashCode() { > final int prime = 31; > int result = 1; > result = prime * result + alphabetSize; > result = prime * result + points.length; > result = prime * result + size; > return result; > } > {code} > Since it does not take account of the contents of the {{points}} array, this > returns the same value for different automatons when their alphabet size and > state size are the same. > For example, this test code passes. > {code:java} > public void testHashCode() throws IOException { > PrefixQuery q1 = new PrefixQuery(new Term("field", "aba")); > PrefixQuery q2 = new PrefixQuery(new Term("field", "fee")); > assert q1.compiled.runAutomaton.hashCode() == > q2.compiled.runAutomaton.hashCode(); > } > {code} > I suspect this is a bug? > Note that I think it's not a serious one; all callers of this {{hashCode()}} > take account of additional information when calculating their own hash value, > it seems there is no substantial impact on higher-level APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org