[ 
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3888:
--------------------------------

    Attachment: LUCENE-3888.patch

Here is a simple prototype of what I was suggesting, allows you to specify 
Analyzer to SpellChecker.

This Analyzer converts the 'surface form' into 'analyzed form' at index and 
query time: at index-time it forms n-grams based on the analyzed form, but 
stores the surface form for retrieval.

At query-time we have a similar process: the docFreq() etc checks are done on 
the surface form, but the actual spellchecking on the analyzed form.

The default Analyzer is null which means do nothing, and the patch has no 
tests, refactoring, or any of that.

                
> split off the spell check word and surface form in spell check dictionary
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-3888
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3888
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3888.patch, LUCENE-3888.patch
>
>
> The "did you mean?" feature by using Lucene's spell checker cannot work well 
> for Japanese environment unfortunately and is the longstanding problem, 
> because the logic needs comparatively long text to check spells, but for some 
> languages (e.g. Japanese), most words are too short to use the spell checker.
> I think, for at least Japanese, the things can be improved if we split off 
> the spell check word and surface form in the spell check dictionary. Then we 
> can use ReadingAttribute for spell checking but CharTermAttribute for 
> suggesting, for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to