Disclaimer ---------- I know basically nothing about OCR, handwriting recognition, speech recognition, or automatic translation.
Tolerance for Ambiguity Makes Things Easier ------------------------------------------- Traditionally, the goal of OCR, handwriting recognition, speech recognition, or automatic translation has been to find the single correct interpretation of marks on paper or sounds in a signal stream. Often, though, it's much easier to produce a description of a set of possible interpretations (perhaps as a finite-state machine or infinite-order Markov model) which reliably includes the correct interpretation than it is to reliably choose the correct interpretation. When is this ambiguous representation good enough to be useful? Downstream disambiguation ------------------------- Traditional (e.g.) OCR software proceeds as a pipeline of stages: convert an image into marks; convert the marks into symbols; convert the symbols into letters; turn the letters into lines of words, columns, paragraphs, and so forth. In its simplest form, this works very badly; one way to remedy this is to pass sets of candidates from one stage to the next. Ocrad, for example, has a -x option to list all the candidate transliterations for each symbols, including certainty levels, so that perhaps a downstream program such as a dictionary-driven spell-checker could disambiguate symbols by context. (Unfortunately, it isn't actually useful in Ocrad 0.11, because it almost never comes up with more than one candidate, and the candidate weights aren't useful either.) This is not the only possible way to do downstream disambiguation; for example, you could resume the algorithm upstream (with Icon or Python generators, or Haskell lazy lists, or Prolog backtracking, or simulating these by hand) to get it to generate more possibilities, or you could run through the entire pipeline several times, each time feeding the output of the next stage into the previous stage as a set of weights. T9 text input on cell phones is an example of this technique: the keyboard produces a finite-state-machine specification of a word (e.g. the regexp '[tuv][ghi][def]') which is then disambiguated to a list of the most probable candidates (perhaps 'the', 'tie', 'vid', in this case) by consulting a dictionary; then the person who typed the word selects the desired word from the list of words. In an OCR system, this approach could simplify the proofreading process Search ------ Suppose you have an NFA representation of a text generated by an OCR program. You can construct an NFA representation of the suffixes of the text by adding a new state with epsilon-transitions to all the existing states. Now, if you have an NFA representation of some other word (generated by speech or handwriting recognition, for example) that you want to find in this document, you can simulate the two NFAs in parallel to find what points in the document might contain it. I suspect this could be very useful. Given infinite-order Markov models of the text --- that is, NFAs with probabilities on the transitions --- you could even include the likelihood of a match in the inputs to a TF/IDF-like ranking algorithm, so that less certain matches ranked lower. Related work ------------ Presumably most work in OCR, neural networks, speech recognition, handwriting recognition, etc., deals with tolerance of ambiguity, and I wouldn't be surprised if there were a bunch of work dealing with uses for ambiguous final products. The Soundex scheme, now nearly a century old, reduces similar-sounding words to small equivalence classes by omitting all vowels, lumping consonants into six consonant classes of similar-sounding letters, and collapsing runs of the same consonant class down into one; the idea is to support search by removing most of the random information introduced into surnames by spelling variation, as well as some of the signal. You could view a Soundex heading such as "S420" as an NFA (/^s[aehiouwy]*(l[aehiouwy]*)+([cgjkqsxz][aehiouwy]*)+$/ I think, producing words like "slice"), so it's pretty closely related in some sense.

