[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wettin updated LUCENE-626: ------------------------------- Comment: was deleted > Adaptive, user query session analyzing spell checker. > ----------------------------------------------------- > > Key: LUCENE-626 > URL: https://issues.apache.org/jira/browse/LUCENE-626 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Karl Wettin > Priority: Minor > Attachments: spellchecker.diff > > > From javadocs: > This is an adaptive, user query session analyzing spell checker. In plain > words, a word and phrase dictionary that will learn from how users act while > searching. > Be aware, this is a beta version. It is not finished, but yeilds great > results if you have enough user activity, RAM and a faily narrow document > corpus. The RAM problem can be fixed if you implement your own subclass of > SpellChecker as the abstract methods of this class are the CRUD methods. This > will most probably change to a strategy class in future version. > TODO: > 1. Gram up results to detect compositewords that should not be composite > words, and vice verse. > 2. Train a gramed token (markov) chain with output from an expectation > maximization algorithm (weka clusters?) parallel to a closest path (A* or > bredth first?) to allow contextual suggestions on queries that never was > placed. > Usage: > Training > At user query time, create an instance of QueryResults containg the query > string, number of hits and a time stamp. Add it to a chronologically ordered > list in the user session (LinkedList makes sense) that you pass on to > train(sessionQueries) as the session times out. > You also want to call the bootstrap() method every 100000 queries or so. > Spell checking > Call getSuggestions(query) and look at the results. Don't modify it! This > method call will be hidden in a facade in future version. > Note that the spell checker is case sensitive, so you want to clean up query > the same way when you train as when you request the suggestions. > I recommend something like query = query.toLowerCase().replaceAll(" ", " > ").trim() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]