[jira] [Issue Comment Edited] (LUCENE-3426) optimizer for n-gram PhraseQuery

Koji Sekiguchi (JIRA) Sun, 11 Sep 2011 19:03:33 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102405#comment-13102405
 ]


Koji Sekiguchi edited comment on LUCENE-3426 at 9/12/11 2:02 AM:
-----------------------------------------------------------------

For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100"
           autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <tokenizer class="solr.CJKTokenizerFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.CJKTokenizerFactory" optimizePhraseQuery="true"/>
  </analyzer>
</fieldType>
{code}


      was (Author: koji):
    For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.CJKTokenizerFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.CJKTokenizerFactory" optimizePhraseQuery="true"/>
  </analyzer>
</fieldType>
{code}

  
> optimizer for n-gram PhraseQuery
> --------------------------------
>
>                 Key: LUCENE-3426
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3426
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Koji Sekiguchi
>            Priority: Trivial
>         Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
> LUCENE-3426.patch, PerfTest.java, PerfTest.java
>
>
> If 2-gram is used and the length of query string is 4, for example q="ABCD", 
> QueryParser generates (when autoGeneratePhraseQueries is true) 
> PhraseQuery("AB BC CD") with slop 0. But it can be optimized PhraseQuery("AB 
> CD") with appropriate positions.
> The idea came from the Japanese paper "N.M-gram: Implementation of Inverted 
> Index Using N-gram with Hash Values" by Mikio Hirabayashi, et al. (The main 
> theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Issue Comment Edited] (LUCENE-3426) optimizer for n-gram PhraseQuery

Reply via email to