[ 
https://issues.apache.org/jira/browse/LUCENE-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030834#comment-14030834
 ] 

Michael McCandless commented on LUCENE-5752:
--------------------------------------------

LightAutomaton is very different from Automaton:

  * It's more like String and A is more like StringBuilder: once LA is
    built you can't link it up to another LA just by changing a few
    transitions

  * It's not mutable after it's created; operations like determinize,
    minimize, totalize, etc. don't happen "in place" anymore, and
    instead return a new LA.  I'd really love to forbid calling
    e.g. determinize and not "using" its result since I could easily
    have caused bugs with this.  This also means we also don't have
    ops like "cloneIfRequired" and "setAllowMutate".

  * LA knows all of its states, vs A which "defines" its states as
    those reachable from initial.  This means LA can have different
    kinds of dead states than A (can reach an accept state but not
    reachable from the initial state)

  * LA doesn't have mutable state, e.g. get/set/clearNumberedStates;
    states are already numbered as they are created (and only exist as
    ints).  There is no "sortTransitions"/reduce: LA's transitions are
    already reduced/sorted as they are created

  * Initial state is always 0

  * There's no special casing for singletons; it's just a normal LA

  * No setMinimizeAlways

Unfortunately this means operations that used to "just link states
together," like concatenate, now do a full copy of the incoming
automata ... so the problem here is these restrictions may be too much
for our usage.  E.g. RegExp keeps chaining and chaining automata
together... in some cases I think we can fix the usage to do the
building directly, but other cases I'm not sure.


> Explore light weight Automaton replacement
> ------------------------------------------
>
>                 Key: LUCENE-5752
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5752
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>
> This effort started with the patch on LUCENE-4556, to create a "light
> weight" replacement for the current object-heavy Automaton class
> (which creates separate State and Transition objects).
> I took that initial patch much further, and cutover most places in
> Lucene that use Automaton to LightAutomaton.  Tests pass.
> The core idea of LightAutomaton is all states are ints, and you build
> up the automaton under the restriction that you add all outgoing
> transitions one state at a time.  This worked well for most
> operations, but for some (e.g. UTF32ToUTF8!!) it was harder, so I also
> added a separate builder to add transitions in any order and then in
> the end they are sorted and added to the real automaton.
> If this is successful I think we should just replace the current
> Automaton with LightAutomaton; right now they both exist in my current
> patch...
> This is very much a work in progress, and I'm not sure the
> restrictions the API imposes are "reasonable" (some algos got uglier).
> But I think it's at least worth exploring/iterating... I'll make a branch and
> commit my current state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to