Manybubbles has uploaded a new change for review.
https://gerrit.wikimedia.org/r/94924
Change subject: Fix two more query types that cause ES to choke
......................................................................
Fix two more query types that cause ES to choke
Queries made of just spaces: This one is probably new but didn't have a
regression test until this commit.
Queries with an escaped term in quotes (mostly "/"): This one isn't new
and someone has been trying this query over and over and over again for
the past few days. This should make it work.
Some small logging improvements so we can see queries that contain trailing
spaces. Those should be trimmed but logging them will provide more
information if this breaks.
Bug: 56790
Change-Id: Ida6815a0a205a52375e3c3c7004d026d4b1a569a
---
M includes/CirrusSearchSearcher.php
M tests/browser/features/full_text.feature
M tests/browser/features/step_definitions/transformers.rb
3 files changed, 33 insertions(+), 10 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch
refs/changes/24/94924/1
diff --git a/includes/CirrusSearchSearcher.php
b/includes/CirrusSearchSearcher.php
index fa12d76..3d98146 100644
--- a/includes/CirrusSearchSearcher.php
+++ b/includes/CirrusSearchSearcher.php
@@ -120,10 +120,11 @@
global $wgCirrusSearchPhraseRescoreBoost;
global $wgCirrusSearchPhraseRescoreWindowSize;
global $wgCirrusSearchPhraseUseText;
- wfDebugLog( 'CirrusSearch', "Searching: $term" );
+ wfDebugLog( 'CirrusSearch', "Searching: \"$term\"" );
// Transform Mediawiki specific syntax to filters and extra
(pre-escaped) query string
$originalTerm = $term;
+ $term = trim( $term );
// Handle title prefix notation
wfProfileIn( __METHOD__ . '-prefix-filter' );
$prefixPos = strpos( $term, 'prefix:' );
@@ -174,7 +175,7 @@
$query = array();
$matches = array();
$offset = 0;
- while ( preg_match(
'/(?<main>"([^"]+)"(?:~[0-9]+)?)(?<fuzzy>~)?/',
+ while ( preg_match(
'/(?<main>"((?:[^"]|(?:\"))+)"(?:~[0-9]+)?)(?<fuzzy>~)?/',
$term, $matches, PREG_OFFSET_CAPTURE, $offset )
) {
$startOffset = $matches[ 0 ][ 1 ];
if ( $startOffset > $offset ) {
@@ -545,6 +546,7 @@
$len = strlen( $string );
for ( $i = 0; $i < $len; $i++ ) {
if ( $inEscape ) {
+ $inEscape = false;
continue;
}
switch ( $string[ $i ] ) {
diff --git a/tests/browser/features/full_text.feature
b/tests/browser/features/full_text.feature
index e6d52c5..78a1d95 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -53,13 +53,16 @@
And there are no search results
And there are no errors reported
Examples:
- | term | title |
- | the empty string | Search |
- | ♙ | Search results |
- | intitle: | Search results |
- | intitle:"" | Search results |
- | incategory: | Search results |
- | incategory:"" | Search results |
+ | term | title |
+ | the empty string | Search |
+ | ♙ | Search results |
+ | intitle: | Search results |
+ | intitle:"" | Search results |
+ | incategory: | Search results |
+ | incategory:"" | Search results |
+ | %{exact: } | Search results |
+ | %{exact: } | Search results |
+ | %{exact: } | Search results |
@setup_suggestions
Scenario: Common phrases spelled incorrectly get suggestions
@@ -181,7 +184,7 @@
When I search for ffnonesensewor~0
Then there are no search results
- @setup_main
+ @setup_main @balance_quotes
Scenario Outline: Searching for for a phrase with a hanging quote adds the
quote automatically
When I search for <term>
Then Two Words is the first search result
@@ -192,6 +195,19 @@
| "two words" "ffnonesenseword catapult pickles |
| "two words" pickles "ffnonesenseword catapult |
+ @balance_quotes
+ Scenario Outline: Searching for a phrase containing /, :, and \" find the
page as expected
+ Given a page named <title> exists
+ When I search for <term>
+ Then <title> is the first search result
+ Examples:
+ | term |
title |
+ | "10.1093/acprof:oso/9780195314250.003.0001" |
10.1093/acprof:oso/9780195314250.003.0001 |
+ | "10.5194/os-8-1071-2012" |
10.5194/os-8-1071-2012 |
+ | "10.7227/rie.86.2" | 10.7227/rie.86.2
|
+ | "10.7227\"yay" | 10.7227"yay
|
+ | intitle:"1911 Encyclopædia Britannica/Dionysius"' | 1911 Encyclopædia
Britannica/Dionysius |
+
@setup_main
Scenario Outline: Searching for "<word> <word>"~<number> activates a
proximity search
When I search for "ffnonesenseword anotherword"~<proximity>
diff --git a/tests/browser/features/step_definitions/transformers.rb
b/tests/browser/features/step_definitions/transformers.rb
index 3f91b35..0320953 100644
--- a/tests/browser/features/step_definitions/transformers.rb
+++ b/tests/browser/features/step_definitions/transformers.rb
@@ -3,3 +3,8 @@
Transform(/%{epoch}/) do |param|
param.gsub('%{epoch}', $start_time.to_i.to_s)
end
+
+# Allow sending strings with trailing spaces
+Transform(/%{exact:[^}]*}/) do |param|
+ param[8..-2]
+end
--
To view, visit https://gerrit.wikimedia.org/r/94924
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ida6815a0a205a52375e3c3c7004d026d4b1a569a
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits