Hello, I'm trying to create a Lucene Query that will take a term and expand it to include common OCR errors (for example, 'cl' is often misread as 'd', so a search for 'clog' should also hit 'dog'). My plan is to do this by generating all the possible variants of a term, using an existing list of errors, and then somehow mapping this into an AutomatonQuery. I've been looking around the o.a.l.util.automaton and o.a.l.util.fst packages on trunk, and I *think* that this is possible, but I'm so far failing to work out how to put the various bits together.
I'm thinking it should work like this: 1) expand query term to sorted list of possible matches 2) create an FST over those matches 3) plug this FST into an AutomatonQuery subclass. 1) is easy. It's 2) and 3) I'm having trouble with. All help gratefully received! Thanks, Alan Woodward --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org