On Wed, Aug 10, 2011 at 7:32 AM, eks dev <[email protected]> wrote:
> Thanks David,
>
> I did not know I can mix Automaton with LevenshteinAutomaton.
>
> What you say is Automaton.concatenate(LevenshteinAutomaton),
> intersect, union would work.
>
You can, by doing this:
LevenshteinAutomata builder = new LevenshteinAutomata("foobar");
Automaton a1 = builder.toAutomaton(1); // n=1
Automaton a2 = builder.toAutomaton(2); // n=2
Other notes:
we actually use these operations (e.g. concatenate) internally,
because FuzzyQuery historically supported a "prefixLen".
so if you do foobar with edit distance=1 and prefixLen of 3,
FuzzyTermsEnum builds a "prefix automaton" of "foo" and concatenates
it with a n=1 automaton of "bar"
Automaton a = builder.toAutomaton(i);
// constant prefix
if (realPrefixLength > 0) {
Automaton prefix = BasicAutomata.makeString(
UnicodeUtil.newString(termText, 0, realPrefixLength));
a = BasicOperations.concatenate(prefix, a);
}
For the regexp syntax you discuss, you can actually already do this.
This is one reason why RegexpQuery has a constructor that takes
AutomatonProvider:
public RegexpQuery(Term term, int flags, AutomatonProvider provider) {
super(term, new RegExp(term.text(), flags).toAutomaton(provider));
}
So you can provide a subclass of AutomatonProvider that implements
custom syntax of your own as long as its surrounded in brackets < >,
e.g. <LEV1:foobar>
AutomatonProvider is a simple interface that answers to named
automata: public Automaton getAutomaton(String name) throws
IOException;
If you do this, make sure you enable named automata (RegExp.AUTOMATON
or of course RegExp.ALL) in the flags!
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]