Adaptive, user query session analyzing spell checker.
-----------------------------------------------------
Key: LUCENE-626
URL: http://issues.apache.org/jira/browse/LUCENE-626
Project: Lucene - Java
Type: New Feature
Components: Search
Reporter: Karl Wettin
Priority: Minor
Attachments: spellcheck_0.0.1.tar.gz
>From javadocs:
This is an adaptive, user query session analyzing spell checker. In plain
words, a word and phrase dictionary that will learn from how users act while
searching.
Be aware, this is a beta version. It is not finished, but yeilds great results
if you have enough user activity, RAM and a faily narrow document corpus. The
RAM problem can be fixed if you implement your own subclass of SpellChecker as
the abstract methods of this class are the CRUD methods. This will most
probably change to a strategy class in future version.
TODO:
1. Gram up results to detect compositewords that should not be composite words,
and vice verse.
2. Train a gramed token (markov) chain with output from an expectation
maximization algorithm (weka clusters?) parallel to a closest path (A* or
bredth first?) to allow contextual suggestions on queries that never was placed.
Usage:
Training
At user query time, create an instance of QueryResults containg the query
string, number of hits and a time stamp. Add it to a chronologically ordered
list in the user session (LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
You also want to call the bootstrap() method every 100000 queries or so.
Spell checking
Call getSuggestions(query) and look at the results. Don't modify it! This
method call will be hidden in a facade in future version.
Note that the spell checker is case sensitive, so you want to clean up query
the same way when you train as when you request the suggestions.
I recommend something like query = query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]