EBernhardson has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/370496 )

Change subject: Limit the number of tokens in a phrase rescore query
......................................................................

Limit the number of tokens in a phrase rescore query

This is a brute force approach to dealing with a problem we have
with the zhwiki analyzer and certain pathological queries. It is
not a complete fix, but will hopefully bandaid over some of its
most problematic effects.

A more complete solution still needs to be determined, perhaps rolling
back or fixing the zh analysis chain.

Bug: T169498
Change-Id: I5d90c40ad5bcf2b648f4299088964c0600e3964d
---
M CirrusSearch.php
M docs/settings.txt
M includes/Query/FullTextQueryStringQueryBuilder.php
3 files changed, 24 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch 
refs/changes/96/370496/1

diff --git a/CirrusSearch.php b/CirrusSearch.php
index ce94339..bec54dd 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -1415,6 +1415,13 @@
  */
 $wgCirrusSearchInterleaveConfig = null;
 
+/**
+ * Maximum number of tokens in a phrase rescore query. Only activated
+ * when token_count_router is enabled in $wgCirrusSearchWikimediaExtraPlugin.
+ * Queries with more tokens than this skip the phrase rescore portion.
+ */
+$wgCirrusSearchMaxPhraseTokens = null;
+
 /*
  * Please update docs/settings.txt if you add new values!
  */
diff --git a/docs/settings.txt b/docs/settings.txt
index ca96f77..7f92bde 100644
--- a/docs/settings.txt
+++ b/docs/settings.txt
@@ -1512,3 +1512,12 @@
 directly, and instead set via $wgCirrusSearchUserTesting triggers. It is
 usefull to perform Team-Draft interleaved search experiments to compare the
 performance of two different search configurations.
+
+; $wgCirrusSearchMaxPhraseTokens
+
+Default:
+       $wgCirrusSearchMaxPhraseTokens = null;
+
+Maximum number of tokens in a phrase rescore query. Only activated
+when token_count_router is enabled in $wgCirrusSearchWikimediaExtraPlugin.
+Queries with more tokens than this skip the phrase rescore portion.
diff --git a/includes/Query/FullTextQueryStringQueryBuilder.php 
b/includes/Query/FullTextQueryStringQueryBuilder.php
index 757ed07..2ab67ed 100644
--- a/includes/Query/FullTextQueryStringQueryBuilder.php
+++ b/includes/Query/FullTextQueryStringQueryBuilder.php
@@ -669,6 +669,14 @@
                                // analyzer
                                'text_search'
                        );
+                       $maxTokens = $this->config->get( 
'CirrusSearchMaxPhraseTokens' );
+                       if ( $maxTokens ) {
+                               $tokCount->addCondition(
+                                       TokenCountRouter::GT,
+                                       10,
+                                       new \CirrusSearch\Elastica\MatchNone()
+                               );
+                       }
                        $tokCount->addCondition(
                                TokenCountRouter::GT,
                                1,

-- 
To view, visit https://gerrit.wikimedia.org/r/370496
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d90c40ad5bcf2b648f4299088964c0600e3964d
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: EBernhardson <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to