Refactor of spell check
-----------------------
Key: LUCENE-537
URL: http://issues.apache.org/jira/browse/LUCENE-537
Project: Lucene - Java
Type: Improvement
Reporter: Karl Wettin
I use the same ngram index for multiple categories, but only want to spell
check per category. The old implementation did not support this as it used
docFreq as controller source.
The spell check returns suggestions with score and not just the suggested word.
TokenFrequencyVector replace the IndexReader used for docFreq.
LuceneTokenFrequencyVector wraps an IndexReader and works just as the old
implementation.
LuceneQueryDictionary creates an ngram dictionary based on a query and not the
whole index.
MultiTokenFrequencyVector treats a number of TokenFrequencyVector:s as one.
I.e. for use when spell checking in multiple contexts.
TokenFrequencyVectorMap is a HashMap facade. Comes with static factory to
create the vector based on the the tokens in a specific field from a search.
I use the TokenFrequencyVectorMap to build one vector per category and
instanciate a MultiTokenFrequencyVector for each user query. Could probably
save a couple of clock ticks by buffering MultiVectors rather than creating new
once all the time.
Also it seems as the ngram-code might not be thread safe. This also include the
source in CVS. Have not succeded to prove it when when testing, only in the
live environment. Each instance of Spellchecker only suggest once. And it takes
quite some resources to create new instances of the spellchecker as it is
designed today. Might get back on that subject.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]