[
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826191#comment-15826191
]
Yakov Sirotkin commented on LUCENE-7639:
----------------------------------------
bq. I'd still be curious whether the thing can be done on a finite state
automaton alone.
That was my starting point: I tried to trace FST and figure out what is wrong
with leading asterisks and how it can be improved. But I suppose that FST is so
perfect that any attempt to improve performance for leading asterisks will
decrease performance for the rest of requests. And leading asterisk is the
FST's Achilles' heel, it iterates over all words in the index and says: "Oops,
this is wrong word, let's try the next one!"
> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
> Key: LUCENE-7639
> URL: https://issues.apache.org/jira/browse/LUCENE-7639
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Yakov Sirotkin
> Attachments: suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so
> request processing speed falls down. This problem can be solved with Suffix
> Array approach. Luckily, Suffix Array can be constructed after Lucene start
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to speed up Suffix Array initialization using several
> threads, so we can control number of threads with
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.
> Attached patch is the suggested implementation for SuffixArray support, it
> works for all terms starting with asterisks with at least 3 consequent
> non-wildcard characters. This patch do not change search results and affects
> only performance issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]