[
https://issues.apache.org/jira/browse/LUCENE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-5603:
--------------------------------
Attachment: LUCENE-5603.patch
Here's a patch.
Reusing my previous benchmark (with polish, see last comment SOLR-3245),
indexing speed increases from 2400 docs/second to 2900 docs/second. So its not
much of a relative increase in speed (due to some properties of this
dictionary), but still I think its worth it. And of course its much better
compared to 71 docs/second in Lucene 4.7...
> fix hunspell to use FST efficiently
> -----------------------------------
>
> Key: LUCENE-5603
> URL: https://issues.apache.org/jira/browse/LUCENE-5603
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-5603.patch
>
>
> previously this was 3 hashes (prefixes, words, suffixes) and it tried to
> split the words in various ways and do lookups. This was changed to FST, but
> the algorithm wasn't adjusted to use it properly (e.g. single pass, terminate
> when it reaches a "dead end").
> this makes for slower indexing when using this stemmer...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]