You were righ about in not working on a big corpus I think there is a limit to the query and it would exceed it on a big corpus
I am myself looking at such a similar thing but going through the basics. Rgds Prabhu On 4/3/06, karl wettin <[EMAIL PROTECTED]> wrote: > > > 31 mar 2006 kl. 06.54 skrev karl wettin: > > > I've been working a bit with the spell checker. It does a pretty > > good job when it comes to finding a smiple typo. > > I was thinking it would be nice if I could turn "heros light and > > magic" to "did you mean: heroes of might and magic?". > > > > My strategy is to combine Markov, A* and Levenstein. > > > Any comments on this? Questions? > > Nothing? Not even a go-go-go? I would really like to discuss it with > someone before I spend too much time on it. This is what it is: a > simple Markov chain is similar to ngrams, but on a word level rather > than character level. A* is a classic gaming algorithm to find the > cheapest path in a matrix. I assume you all know Levenstein from > FuzzyQuery. > > I have been sleeping on this a bit and think it might not work on a > big corpus. One probably have to limit it to one Markov chain per > context of some kind. Say category or so. > > Perhaps there is some other forum more focused on text analysis you > would like to recommend me? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >