[
https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916787#action_12916787
]
Robert Muir commented on LUCENE-2507:
-------------------------------------
By the way, out of curiousity i tested an alternative configuration,
DirectSpellChecker with .setMaxEdits(1)
With this "lighter" configuration:
||impl||Number correct (out of 547)||Number correct, inverted (out of 547)||Avg
time in ms||
|DirectSpellChecker(n=1)|165|432|1.83ms|
So here, you have the flexibility to have essentially the same performance as
the existing spellchecker,
and the false positive rate is hugely reduced (in this contrived test). You
trade off only being able to
catch 77% of the suggestions relative to the old spellchecker... but this might
be good for setups
that feel the n=2 default is too aggressive.
And again, like the original configuration, you have no index to rebuild at all.
> automaton spellchecker
> ----------------------
>
> Key: LUCENE-2507
> URL: https://issues.apache.org/jira/browse/LUCENE-2507
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/spellchecker
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch,
> LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries
> this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an
> algorithm such as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the
> index, then we wouldn't need
> a separate index to rebuild.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]