EBernhardson has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/370496 )
Change subject: Limit the number of tokens in a phrase rescore query
......................................................................
Limit the number of tokens in a phrase rescore query
This is a brute force approach to dealing with a problem we have
with the zhwiki analyzer and certain pathological queries. It is
not a complete fix, but will hopefully bandaid over some of its
most problematic effects.
A more complete solution still needs to be determined, perhaps rolling
back or fixing the zh analysis chain.
Bug: T169498
Change-Id: I5d90c40ad5bcf2b648f4299088964c0600e3964d
---
M CirrusSearch.php
M docs/settings.txt
M includes/Query/FullTextQueryStringQueryBuilder.php
3 files changed, 24 insertions(+), 0 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch
refs/changes/96/370496/1
diff --git a/CirrusSearch.php b/CirrusSearch.php
index ce94339..bec54dd 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -1415,6 +1415,13 @@
*/
$wgCirrusSearchInterleaveConfig = null;
+/**
+ * Maximum number of tokens in a phrase rescore query. Only activated
+ * when token_count_router is enabled in $wgCirrusSearchWikimediaExtraPlugin.
+ * Queries with more tokens than this skip the phrase rescore portion.
+ */
+$wgCirrusSearchMaxPhraseTokens = null;
+
/*
* Please update docs/settings.txt if you add new values!
*/
diff --git a/docs/settings.txt b/docs/settings.txt
index ca96f77..7f92bde 100644
--- a/docs/settings.txt
+++ b/docs/settings.txt
@@ -1512,3 +1512,12 @@
directly, and instead set via $wgCirrusSearchUserTesting triggers. It is
usefull to perform Team-Draft interleaved search experiments to compare the
performance of two different search configurations.
+
+; $wgCirrusSearchMaxPhraseTokens
+
+Default:
+ $wgCirrusSearchMaxPhraseTokens = null;
+
+Maximum number of tokens in a phrase rescore query. Only activated
+when token_count_router is enabled in $wgCirrusSearchWikimediaExtraPlugin.
+Queries with more tokens than this skip the phrase rescore portion.
diff --git a/includes/Query/FullTextQueryStringQueryBuilder.php
b/includes/Query/FullTextQueryStringQueryBuilder.php
index 757ed07..2ab67ed 100644
--- a/includes/Query/FullTextQueryStringQueryBuilder.php
+++ b/includes/Query/FullTextQueryStringQueryBuilder.php
@@ -669,6 +669,14 @@
// analyzer
'text_search'
);
+ $maxTokens = $this->config->get(
'CirrusSearchMaxPhraseTokens' );
+ if ( $maxTokens ) {
+ $tokCount->addCondition(
+ TokenCountRouter::GT,
+ 10,
+ new \CirrusSearch\Elastica\MatchNone()
+ );
+ }
$tokCount->addCondition(
TokenCountRouter::GT,
1,
--
To view, visit https://gerrit.wikimedia.org/r/370496
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d90c40ad5bcf2b648f4299088964c0600e3964d
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: EBernhardson <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits