The thing you're describing is a regular composition of automata (as it exists, for example, when composing clauses of a regular expression). If I recall right the Levenshtein automaton in Lucene is built on modified brics code... if so then this should be not a problem. The problem may be that currently automatons are used in enums in a way that skips from one accepted sequence to another accepted sequence (if possible). If the automaton has * operators then there is no way to establish these and everything falls back to full matching strategy.
Dawid On Wed, Aug 10, 2011 at 10:54 AM, eks dev <[email protected]> wrote: > > Hi Robert, Mike & other FS(A|T) gurus, > > a challenge for you ;) > > Would it be possible to combine these brilliant peaces of functionality > with normal Automaton somehow... > > Example to illustrate. > DirectSpellChecker: > - where instead of minPrefix, we would specify Regex (other Automaton) > pfxAutiomaton = Regex("(AB)|(BA)") // e.g. Saying, > levAutomaton = LevenshteinAutomata("XYZ") > > spell(pfxAutomaton, levAutomaton); > > would match terms that start with "AB" or "BA" and suffix part are normal > edit distance matches, like ABXY, with one delete > This would support wild things, like "enable only transpositions in first > three characters"... In order to gat these matches today, you need to make > Lev. Automata with maxDistance = 2 (which is then HUGE space to search > without prefix)... Or generate more Lev. automata and make union of results > (expensive to itterate) > > Other good use cases are simple to construct... > > The most general question, can we support at least concatenation between > LevenshteinAutomata and normal Automata. Intersection/union would be crazy > thing as well? Where we would have: > FilteringAutomata.intersect(LevenshteinAutomata)... but I guess I am > dreaming with this one, but concatenation sounds doable (at least prefix > side) > > Cheers, > Eks > >
