[jira] [Updated] (LUCENE-5603) fix hunspell to use FST efficiently

Robert Muir (JIRA) Sat, 12 Apr 2014 08:56:32 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-5603:
--------------------------------

    Attachment: LUCENE-5603.patch

Here's a patch. 

Reusing my previous benchmark (with polish, see last comment SOLR-3245), 
indexing speed increases from 2400 docs/second to 2900 docs/second. So its not 
much of a relative increase in speed (due to some properties of this 
dictionary), but still I think its worth it. And of course its much better 
compared to 71 docs/second in Lucene 4.7...


> fix hunspell to use FST efficiently
> -----------------------------------
>
>                 Key: LUCENE-5603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5603
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5603.patch
>
>
> previously this was 3 hashes (prefixes, words, suffixes) and it tried to 
> split the words in various ways and do lookups. This was changed to FST, but 
> the algorithm wasn't adjusted to use it properly (e.g. single pass, terminate 
> when it reaches a "dead end").
> this makes for slower indexing when using this stemmer...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-5603) fix hunspell to use FST efficiently

Reply via email to