[ 
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954727#comment-15954727
 ] 

Yakov Sirotkin edited comment on LUCENE-7639 at 4/4/17 7:41 AM:
----------------------------------------------------------------

Maybe I have explanation why search with leading asterisk is not easy. Let's 
assume that you have traditional address book on paper and your are looking for 
someone with compound surname _Zeta-Jones_. If you forget the second part you 
can search by _Zeta_ without any problems.
But if you forget the first part, you need to check the whole address book 
looking for _Jones_, in fact, index is useless in such case.


was (Author: yasha):
Maybe I have explanation why search with leading asterisk is not easy. Let's 
assume that you have traditional address book on paper and 
your are looking for someone with compound surname Zeta-Jones. If you forget 
the second part you can search by Zeta without any problems.
But if you forget the first part, you need to check the whole address book 
looking for Jones, in fact, index is useless in such case.

> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
>                 Key: LUCENE-7639
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7639
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yakov Sirotkin
>         Attachments: suffix-array-2.patch, suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so 
> request processing speed falls down. This problem can be solved with Suffix 
> Array approach. Luckily, Suffix Array can be constructed after Lucene start 
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we 
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to  speed up Suffix Array initialization using several 
> threads, so we can control number of threads with 
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.  
> Attached patch is the suggested implementation for SuffixArray support, it 
> works for all terms starting with asterisks with at least 3 consequent 
> non-wildcard characters. This patch do not change search results and  affects 
> only performance issues.
> *Update*
> suffix-arra-2.patch is an improved version of the first patch, system 
> properties for it are following::
> {{lucene.suffixArray.enable}} - {{true}}, if you want to enable Suffix Array 
> support. Default value - {{false}}.
> {{lucene.suffixArray.initializationThreadsCount}} - number of threads for 
> Suffix Array initialization, if you set {{0}} - no additional threads used. 
> Default value - {{5}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to