Grant, It's definitely dictionary based spell checker. A bit fleshing out, currently the document gets indexed and then it's analysed (bad words, repetitions etc), spell check - no corrections - would be yet another step in the process. It's all read-only stuff, the document content is not modified, it's just tagged accordingly. That said, I kind of like your idea, I mean token filter looks like the good candidate. As of Lazzy, is it any different than Lucene SpellChecker (ngram based)? what really matters here is not the accuracy (decent but not exceptional - there is a manual double- check of tagged docs anyway), what matters most is performance and ease of integration. Any grammar check is absolutely immaterial. About that payload idea, I can only work with a token in a filter. I could attach something and spit it out, but what would be that something? It would have to be searchable I assume, otherwise I could perform the check without filter, out of index. If it's searchable then, apart from querying, I could perhaps make highlighter work with it nicely.
Thx, Mac Grant Ingersoll-6 wrote: > > I think I'm missing something here... > > Spell checked in what sense? Sounds to me like you need dictionary > based spell checking during index, not index based spelling during > search, right? > > How about hooking up something like the Jazzy spell checker into a > TokenFilter? Then, as the tokens stream by, you lookup the spelling > and then add a 1 byte payload to all words that are misspelled. > > As for Highlighter, hmmm... Not sure if there is a way to make a > Fragmenter/Scorer that was payload aware, such that it would only > produce fragments (and scores) for sections of the file that have > these payloads. Definitely pushing my area of expertise, but maybe > one of the Highlighter experts can chime in. > > HTH, > Grant > > On Dec 11, 2008, at 6:18 AM, Lucene User no 1981 wrote: > >> >> Hi, >> >> the problem is as follows: there is a text, ca. 30kb, it has to be >> spellchecked automatically, there is no manual intervention, no >> suggestions >> needed. All I would like to achieve is a simple check if there are any >> problems with the spelling or not. It has to be rather fast cause >> there are >> tons of docs a minute going thru the system. Solutions like >> SpellChecker.exists() don't really apply. Additionally, spelling >> errors >> could be highlighted - haven't really found any reasonable way of >> leveraging >> Highlighter for that task. >> >> Does anyone have any idea how this problem can be addressed with >> Lucene? >> >> Regards, >> Mac >> -- >> View this message in context: >> http://www.nabble.com/Spell-check-of-a-large-text-tp20953625p20953625.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Spell-check-of-a-large-text-tp20953625p20973238.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org