jenkins-bot has submitted this change and it was merged.
Change subject: Fix two more query types that cause ES to choke
......................................................................
Fix two more query types that cause ES to choke
Queries made of just spaces: This one is probably new but didn't have a
regression test until this commit.
Queries with an escaped term in quotes (mostly "/"): This one isn't new
and someone has been trying this query over and over and over again for
the past few days. This should make it work.
Some small logging improvements so we can see queries that contain trailing
spaces. Those should be trimmed but logging them will provide more
information if this breaks.
Bug: 56790
Change-Id: Ida6815a0a205a52375e3c3c7004d026d4b1a569a
---
M includes/CirrusSearchSearcher.php
M tests/browser/features/full_text.feature
M tests/browser/features/step_definitions/transformers.rb
3 files changed, 37 insertions(+), 10 deletions(-)
Approvals:
Chad: Looks good to me, approved
jenkins-bot: Verified
diff --git a/includes/CirrusSearchSearcher.php
b/includes/CirrusSearchSearcher.php
index e7d7d47..5df631b 100644
--- a/includes/CirrusSearchSearcher.php
+++ b/includes/CirrusSearchSearcher.php
@@ -123,10 +123,11 @@
global $wgCirrusSearchPhraseRescoreBoost;
global $wgCirrusSearchPhraseRescoreWindowSize;
global $wgCirrusSearchPhraseUseText;
- wfDebugLog( 'CirrusSearch', "Searching: $term" );
+ wfDebugLog( 'CirrusSearch', "Searching: \"$term\"" );
// Transform Mediawiki specific syntax to filters and extra
(pre-escaped) query string
$originalTerm = $term;
+ $term = trim( $term );
// Handle title prefix notation
wfProfileIn( __METHOD__ . '-prefix-filter' );
$prefixPos = strpos( $term, 'prefix:' );
@@ -174,7 +175,11 @@
$this->filters = $filters;
wfProfileOut( __METHOD__ . '-other-filters' );
wfProfileIn( __METHOD__ . '-switch-phrase-queries-to-plain' );
- $query = self::replacePartsOfQuery( $term,
'/(?<main>"([^"]+)"(?:~[0-9]+)?)(?<fuzzy>~)?/',
+ // Match quoted phrases including those containing escaped
quotes
+ // Those phrases can optionally be followed by ~ then a number
(this is the phrase slop)
+ // That can optionally be followed by a ~ (this matches stemmed
words in phrases)
+ // The following all match: "a", "a boat", "a\"boat", "a
boat"~, "a boat"~9, "a boat"~9~
+ $query = self::replacePartsOfQuery( $term,
'/(?<main>"((?:[^"]|(?:\"))+)"(?:~[0-9]+)?)(?<fuzzy>~)?/',
function ( $matches ) use ( $showRedirects ) {
$main =
CirrusSearchSearcher::fixupQueryStringPart( $matches[ 'main' ][ 0 ] );
if ( !isset( $matches[ 'fuzzy' ] ) ) {
@@ -605,6 +610,7 @@
$len = strlen( $string );
for ( $i = 0; $i < $len; $i++ ) {
if ( $inEscape ) {
+ $inEscape = false;
continue;
}
switch ( $string[ $i ] ) {
diff --git a/tests/browser/features/full_text.feature
b/tests/browser/features/full_text.feature
index c83e368..5a9d4d4 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -49,13 +49,16 @@
And there are no search results
And there are no errors reported
Examples:
- | term | title |
- | the empty string | Search |
- | ♙ | Search results |
- | intitle: | Search results |
- | intitle:"" | Search results |
- | incategory: | Search results |
- | incategory:"" | Search results |
+ | term | title |
+ | the empty string | Search |
+ | ♙ | Search results |
+ | intitle: | Search results |
+ | intitle:"" | Search results |
+ | incategory: | Search results |
+ | incategory:"" | Search results |
+ | %{exact: } | Search results |
+ | %{exact: } | Search results |
+ | %{exact: } | Search results |
@setup_suggestions
Scenario: Common phrases spelled incorrectly get suggestions
@@ -177,7 +180,7 @@
When I search for ffnonesensewor~0
Then there are no search results
- @setup_main
+ @setup_main @balance_quotes
Scenario Outline: Searching for for a phrase with a hanging quote adds the
quote automatically
When I search for <term>
Then Two Words is the first search result
@@ -188,6 +191,19 @@
| "two words" "ffnonesenseword catapult pickles |
| "two words" pickles "ffnonesenseword catapult |
+ @balance_quotes
+ Scenario Outline: Searching for a phrase containing /, :, and \" find the
page as expected
+ Given a page named <title> exists
+ When I search for <term>
+ Then <title> is the first search result
+ Examples:
+ | term |
title |
+ | "10.1093/acprof:oso/9780195314250.003.0001" |
10.1093/acprof:oso/9780195314250.003.0001 |
+ | "10.5194/os-8-1071-2012" |
10.5194/os-8-1071-2012 |
+ | "10.7227/rie.86.2" | 10.7227/rie.86.2
|
+ | "10.7227\"yay" | 10.7227"yay
|
+ | intitle:"1911 Encyclopædia Britannica/Dionysius"' | 1911 Encyclopædia
Britannica/Dionysius |
+
@setup_main
Scenario Outline: Searching for "<word> <word>"~<number> activates a
proximity search
When I search for "ffnonesenseword anotherword"~<proximity>
diff --git a/tests/browser/features/step_definitions/transformers.rb
b/tests/browser/features/step_definitions/transformers.rb
index 3f91b35..0320953 100644
--- a/tests/browser/features/step_definitions/transformers.rb
+++ b/tests/browser/features/step_definitions/transformers.rb
@@ -3,3 +3,8 @@
Transform(/%{epoch}/) do |param|
param.gsub('%{epoch}', $start_time.to_i.to_s)
end
+
+# Allow sending strings with trailing spaces
+Transform(/%{exact:[^}]*}/) do |param|
+ param[8..-2]
+end
--
To view, visit https://gerrit.wikimedia.org/r/94924
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ida6815a0a205a52375e3c3c7004d026d4b1a569a
Gerrit-PatchSet: 3
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
Gerrit-Reviewer: Chad <[email protected]>
Gerrit-Reviewer: Cmcmahon <[email protected]>
Gerrit-Reviewer: Manybubbles <[email protected]>
Gerrit-Reviewer: jenkins-bot
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits