Building FST-like automaton queries

Alan Woodward Tue, 28 Feb 2012 04:34:21 -0800

Hello,

I'm trying to create a Lucene Query that will take a term and expand it to 
include common OCR errors (for example, 'cl' is often misread as 'd', so a 
search for 'clog' should also hit 'dog').  My plan is to do this by generating 
all the possible variants of a term, using an existing list of errors, and then 
somehow mapping this into an AutomatonQuery.  I've been looking around the 
o.a.l.util.automaton and o.a.l.util.fst packages on trunk, and I *think* that 
this is possible, but I'm so far failing to work out how to put the various 
bits together.


I'm thinking it should work like this:
1) expand query term to sorted list of possible matches
2) create an FST over those matches
3) plug this FST into an AutomatonQuery subclass.

1) is easy.  It's 2) and 3) I'm having trouble with.  

All help gratefully received!

Thanks, 

Alan Woodward
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Building FST-like automaton queries

Reply via email to