mikemccand commented on code in PR #13072:
URL: https://github.com/apache/lucene/pull/13072#discussion_r1504523798


##########
lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java:
##########
@@ -92,6 +93,7 @@ public Automaton() {
   public Automaton(int numStates, int numTransitions) {
     states = new int[numStates * 2];
     isAccept = new BitSet(numStates);
+    terminable = new BitSet(numStates);

Review Comment:
   One spooky thing about this new `BitSet` is it is "best effort" now?  I.e. 
one could create an Automaton that indeed has some states that match all 
suffixes, but forget to set the bit here?  E.g. if I build a `RegexpQuery` that 
is actually a `PrefixQuery` we won't set this?
   
   Everything else about `Automaton` today is fundamental (states, transitions, 
isAccept) and necessary, but this new member is more a best effort optimization?



##########
lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java:
##########
@@ -70,6 +70,7 @@ public class Automaton implements Accountable, 
TransitionAccessor {
   private int[] states;
 
   private final BitSet isAccept;
+  private final BitSet terminable;

Review Comment:
   At first I couldn't understand what `terminable` means.
   
   If a bit is set for a state, does that mean this state accepts everything 
from now on (all possible suffixes)?
   
   Could we maybe rename it to `isMatchAllSuffix` or so?



##########
lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java:
##########
@@ -67,12 +68,16 @@ protected RunAutomaton(Automaton a, int alphabetSize) {
     points = a.getStartPoints();
     size = Math.max(1, a.getNumStates());
     accept = new FixedBitSet(size);
+    terminable = new FixedBitSet(size);

Review Comment:
   Perhaps, instead of storing this concept in `Automaton`, we could solely 
store it in `RunAutomaton`, if we can efficiently find final states that 
effectively have `.*` transitions to themselves?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to