31 mar 2006 kl. 06.54 skrev karl wettin:
I've been working a bit with the spell checker. It does a pretty
good job when it comes to finding a smiple typo.
I was thinking it would be nice if I could turn "heros light and
magic" to "did you mean: heroes of might and magic?".
My strategy is to combine Markov, A* and Levenstein.
Any comments on this? Questions?
Nothing? Not even a go-go-go? I would really like to discuss it with
someone before I spend too much time on it. This is what it is: a
simple Markov chain is similar to ngrams, but on a word level rather
than character level. A* is a classic gaming algorithm to find the
cheapest path in a matrix. I assume you all know Levenstein from
FuzzyQuery.
I have been sleeping on this a bit and think it might not work on a
big corpus. One probably have to limit it to one Markov chain per
context of some kind. Say category or so.
Perhaps there is some other forum more focused on text analysis you
would like to recommend me?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]