[
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827721#comment-15827721
]
Dawid Weiss commented on LUCENE-7639:
-------------------------------------
Nothing is wrong with the FST, really. A suffix array is functionally
equivalent to a suffix tree, which you could build and encode as an FST. Then
any infix matching would be done similarly to suffix array-based lookups. These
are all interchangeable concepts. The way we use FSTs in Lucene is just one
particular application to solve one particular problem (essentially implement
an efficient partial prefix trie), hence my remark on the possibility of using
two automata to make an intersection of wildcards possible.
As for your patch, I really don't know how much of a problem (in typical use)
prefix and infix wildcards are. While leading wildcards are heavily used for
many things (suggestions, etc.) I can't quite remember when I needed a prefix
or infix wildcard.
In other words -- don't know whether there'll be much interest in maintaining
this code if it went into Lucene, especially considering the implementation
details you mentioned (amount of memory it requires at runtime, separate
construction process, etc.). Let's see what others think though.
> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
> Key: LUCENE-7639
> URL: https://issues.apache.org/jira/browse/LUCENE-7639
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Yakov Sirotkin
> Attachments: suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so
> request processing speed falls down. This problem can be solved with Suffix
> Array approach. Luckily, Suffix Array can be constructed after Lucene start
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to speed up Suffix Array initialization using several
> threads, so we can control number of threads with
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.
> Attached patch is the suggested implementation for SuffixArray support, it
> works for all terms starting with asterisks with at least 3 consequent
> non-wildcard characters. This patch do not change search results and affects
> only performance issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]