[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

WangFeiCheng (Jira) Fri, 13 Nov 2020 23:53:03 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


WangFeiCheng updated LUCENE-9609:
---------------------------------
    Description: 
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer terms, 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
efficiency
{code:java}
static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

public Query rewrite(IndexReader reader) throws IOException {
    final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
BooleanQuery.getMaxClauseCount());
    if (termData.size() <= threshold) {
      BooleanQuery.Builder bq = new BooleanQuery.Builder();
      TermIterator iterator = termData.iterator();
      for (BytesRef term = iterator.next(); term != null; term = 
iterator.next()) {
        bq.add(new TermQuery(new Term(iterator.field(), 
BytesRef.deepCopyOf(term))), Occur.SHOULD);
      }
      return new ConstantScoreQuery(bq.build());
    }
    return super.rewrite(reader);
  }
{code}
 When the term of the query statement exceeds 16, the createWeight method in 
TermInSetQuery will be used
{code:java}
public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
boost) throws IOException {
    return new ConstantScoreWeight(this, boost) {

      @Override
      public void extractTerms(Set<Term> terms) {
        // no-op
        // This query is for abuse cases when the number of terms is too high to
        // run efficiently as a BooleanQuery. So likewise we hide its terms in
        // order to protect highlighters
      }

      ......
  }
{code}
I want to ask, why do I say "we hide its terms in order to protect highlighters"

Why this threshold can highlight protection, or how to implement such " protect 
highlighters"?

 

 

  was:
I noticed that when there are too many terms, the highlighted query is 
restricted

I know that in TermInSetQuery, when there are fewer entries, please use 
BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 to improve query efficiency
{code:java}
静态最终整数BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;

公共查询重写（IndexReader阅读器）引发IOException {
    最终int阈值= 
Math.min（BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD，BooleanQuery.getMaxClauseCount（））;
    如果（termData.size（）<=阈值）{
      BooleanQuery.Builder bq =新的BooleanQuery.Builder（）;
      TermIterator迭代器= termData.iterator（）;
      for（BytesRef term = iterator.next（）; term！= null; term = iterator.next（））{
        bq.add（new TermQuery（new 
Term（iterator.field（），BytesRef.deepCopyOf（term））），Occur.SHOULD）;
      }
      返回新的ConstantScoreQuery（bq.build（））;
    }
    返回super.rewrite（reader）;
  }
{code}
但是，在extractTerms中使用TermInSetQuery方法时，查询条件的重点超过16个

 
{code:java}
@Override
public void extractTerms（Set <Term>术语）{
    //无操作
    //此查询用于术语数量过多而无法使用的滥用情况
    //作为BooleanQuery有效运行。因此，我们同样将其术语隐藏在
    //为了保护荧光笔
}
{code}
我想问一下，为什么要说“所以同样，我们为了保护荧光笔而隐藏了它的术语”

为什么这个阈值可以保护重点，以及如何实现这种“保护”？

 

 


> When the term of more than 16, highlight the query does not return
> ------------------------------------------------------------------
>
>                 Key: LUCENE-9609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9609
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/search
>    Affects Versions: 7.7.3
>            Reporter: WangFeiCheng
>            Priority: Minor
>
> I noticed that when there are too many terms, the highlighted query is 
> restricted
> I know that in TermInSetQuery, when there are fewer terms, 
> BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16 will be used to improve query 
> efficiency
> {code:java}
> static final int BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD = 16;
> public Query rewrite(IndexReader reader) throws IOException {
>     final int threshold = Math.min(BOOLEAN_REWRITE_TERM_COUNT_THRESHOLD, 
> BooleanQuery.getMaxClauseCount());
>     if (termData.size() <= threshold) {
>       BooleanQuery.Builder bq = new BooleanQuery.Builder();
>       TermIterator iterator = termData.iterator();
>       for (BytesRef term = iterator.next(); term != null; term = 
> iterator.next()) {
>         bq.add(new TermQuery(new Term(iterator.field(), 
> BytesRef.deepCopyOf(term))), Occur.SHOULD);
>       }
>       return new ConstantScoreQuery(bq.build());
>     }
>     return super.rewrite(reader);
>   }
> {code}
>  When the term of the query statement exceeds 16, the createWeight method in 
> TermInSetQuery will be used
> {code:java}
> public Weight createWeight(IndexSearcher searcher, boolean needsScores, float 
> boost) throws IOException {
>     return new ConstantScoreWeight(this, boost) {
>       @Override
>       public void extractTerms(Set<Term> terms) {
>         // no-op
>         // This query is for abuse cases when the number of terms is too high 
> to
>         // run efficiently as a BooleanQuery. So likewise we hide its terms in
>         // order to protect highlighters
>       }
>       ......
>   }
> {code}
> I want to ask, why do I say "we hide its terms in order to protect 
> highlighters"
> Why this threshold can highlight protection, or how to implement such " 
> protect highlighters"?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9609) When the term of more than 16, highlight the query does not return

Reply via email to