[jira] Commented: (LUCENE-537) Refactor of spell check

Karl Wettin (JIRA) Fri, 31 Mar 2006 17:13:58 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-537?page=comments#action_12372746 ]


Karl Wettin commented on LUCENE-537:
------------------------------------

This just came on Lucene-users and might explain what I thought was thread 
safty. I'll take a look at update my refactored code some time soon.

        Från:     [EMAIL PROTECTED]
        Ämne:   Spellchecker bug (or feature?)
        Datum:  lördag 1 apr 2006 00.20.08 GMT+02:00
        Till:     [email protected]
        Svara till:       [email protected]

Not sure if this is the right place to report this issue:

  The accuracy value, which can be set via setAccuracy(), is being modified in 
SpellChecker.java when a word is checked. As a result, the "min" may be pushed
  very high and will not suggest anything for later requests.

  One workaround would be to call setAccuracy() each time before a word is 
checked, I'm not sure if this is a feature (intended behavior) or a bug.
  By the way, I'm using spellchecker 1.9.1 that comes with Lucene 1.9.1.

  Thanks,

  Xiaocheng


> Refactor of spell check
> -----------------------
>
>          Key: LUCENE-537
>          URL: http://issues.apache.org/jira/browse/LUCENE-537
>      Project: Lucene - Java
>         Type: Improvement
>     Reporter: Karl Wettin
>  Attachments: lucene_spellcheck.tar.gz
>
> I use the same ngram index for multiple categories, but only want to spell 
> check per category. The old implementation did not support this as it used 
> docFreq as controller source.
> The spell check returns suggestions with score and not just the suggested 
> word.
> TokenFrequencyVector replace the IndexReader used for docFreq. 
> LuceneTokenFrequencyVector wraps an IndexReader and works just as the old 
> implementation.
> LuceneQueryDictionary creates an ngram dictionary based on a query and not 
> the whole index.
> MultiTokenFrequencyVector treats a number of TokenFrequencyVector:s as one. 
> I.e. for use when spell checking in multiple contexts.
> TokenFrequencyVectorMap is a HashMap facade. Comes with static factory to 
> create the vector based on the the tokens in a specific field from a search.
> I use the TokenFrequencyVectorMap to build one vector per category and 
> instanciate a MultiTokenFrequencyVector for each  user query. Could probably 
> save a couple of clock ticks by buffering MultiVectors rather than creating 
> new once all the time.
> Also it seems as the ngram-code might not be thread safe. This also include 
> the source in CVS. Have not succeded to prove it when when testing, only in 
> the live environment. Each instance of Spellchecker only suggest once. And it 
> takes quite some resources to create new instances of the spellchecker as it 
> is designed today. Might get back on that subject.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-537) Refactor of spell check

Reply via email to