jenkins-bot has submitted this change and it was merged.

Change subject: Remove support for the all field in morelike
......................................................................


Remove support for the all field in morelike

This was never used in production. It needs another pass
to fetch the data.
It also helps to make Keywordfeatures construction more
homogeneous.

Change-Id: I055816d8f8ff115fe499dbce64fd38a0e70a4ee5
---
M CirrusSearch.php
M i18n/en.json
M includes/Hooks.php
M includes/Query/MoreLikeFeature.php
M includes/Searcher.php
M tests/browser/features/more_like_this_options.feature
M tests/unit/Query/MoreLikeFeatureTest.php
7 files changed, 4 insertions(+), 73 deletions(-)

Approvals:
  EBernhardson: Looks good to me, approved
  jenkins-bot: Verified

Objections:
  Cindy-the-browser-test-bot: There's a problem with this change, please improve



diff --git a/CirrusSearch.php b/CirrusSearch.php
index a108f29..873f960 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -514,17 +514,7 @@
        'auxiliary_text',
        'opening_text',
        'headings',
-       'all'
 ];
-
-// When set to false cirrus will use the text content to build the query
-// and search on the field listed in $wgCirrusSearchMoreLikeThisFields
-// Set to true if you want to use field data as input text to build the initial
-// query.
-// Note that if the all field is used then this setting will be forced to true.
-// This is because the all field is not part of the _source and its content 
cannot
-// be retrieved by elasticsearch.
-$wgCirrusSearchMoreLikeThisUseFields = false;
 
 // This allows redirecting queries to a separate cluster configured
 // in $wgCirrusSearchClusters. Note that queries can use multiple features, in
diff --git a/i18n/en.json b/i18n/en.json
index f84b64f..54bd983 100644
--- a/i18n/en.json
+++ b/i18n/en.json
@@ -22,7 +22,7 @@
        "apihelp-cirrus-settings-dump-example": "Get a dump of CirrusSearch 
settings for this wiki.",
        "apierror-cirrus-requesttoolong": "Prefix search request was longer 
than the maximum allowed length. ($1 > $2)",
        "cirrussearch-give-feedback": "Give us your feedback",
-       "cirrussearch-morelikethis-settings": " #<!-- leave this line exactly 
as it is --> <pre>\n# This message lets you configure the settings of the 
\"more like this\" feature.\n# Changes to this take effect immediately.\n# The 
syntax is as follows:\n#   * Everything from a \"#\" character to the end of 
the line is a comment.\n#   * Every non-blank line is the setting name followed 
by a \":\" character followed by the setting value\n# The settings are:\n#   * 
min_doc_freq (integer): Minimum number of documents (per shard) that need a 
term for it to be considered.\n#   * max_doc_freq (integer): Maximum number of 
documents (per shard) that have a term for it to be considered.\n#              
     High frequency terms are generally \"stop words\".\n#   * max_query_terms 
(integer): Maximum number of terms to be considered. This value is limited to 
$wgCirrusSearchMoreLikeThisMaxQueryTermsLimit (100).\n#   * min_term_freq 
(integer): Minimum number of times the term appears in the input to doc to be 
considered. For small fields (title) this value should be 1.\n#   * 
minimum_should_match (percentage -100% to 100%, or integer number of terms): 
The percentage of terms to match on. Defaults to 30%.\n#   * min_word_len 
(integer): Minimal length of a term to be considered. Defaults to 0.\n#   * 
max_word_len (integer): The maximum word length above which words will be 
ignored. Defaults to unbounded (0).\n#   * fields (comma separated list of 
values): These are the fields to use. Allowed fields are title, text, 
auxiliary_text, opening_text, headings and all.\n#   * use_fields (true|false) 
: Tell the \"more like this\" query to use only the field data. Defaults to 
false: the system will extract the content of the text field to build the 
query.\n# Examples of good lines:\n# min_doc_freq:2\n# max_doc_freq:20000\n# 
max_query_terms:25\n# min_term_freq:2\n# minimum_should_match:30%\n# 
min_word_len:2\n# max_word_len:40\n# fields:text,opening_text\n# 
use_fields:true\n# </pre> <!-- leave this line exactly as it is -->",
+       "cirrussearch-morelikethis-settings": " #<!-- leave this line exactly 
as it is --> <pre>\n# This message lets you configure the settings of the 
\"more like this\" feature.\n# Changes to this take effect immediately.\n# The 
syntax is as follows:\n#   * Everything from a \"#\" character to the end of 
the line is a comment.\n#   * Every non-blank line is the setting name followed 
by a \":\" character followed by the setting value\n# The settings are:\n#   * 
min_doc_freq (integer): Minimum number of documents (per shard) that need a 
term for it to be considered.\n#   * max_doc_freq (integer): Maximum number of 
documents (per shard) that have a term for it to be considered.\n#              
     High frequency terms are generally \"stop words\".\n#   * max_query_terms 
(integer): Maximum number of terms to be considered. This value is limited to 
$wgCirrusSearchMoreLikeThisMaxQueryTermsLimit (100).\n#   * min_term_freq 
(integer): Minimum number of times the term appears in the input to doc to be 
considered. For small fields (title) this value should be 1.\n#   * 
minimum_should_match (percentage -100% to 100%, or integer number of terms): 
The percentage of terms to match on. Defaults to 30%.\n#   * min_word_len 
(integer): Minimal length of a term to be considered. Defaults to 0.\n#   * 
max_word_len (integer): The maximum word length above which words will be 
ignored. Defaults to unbounded (0).\n#   * fields (comma separated list of 
values): These are the fields to use. Allowed fields are title, text, 
auxiliary_text, opening_text, headings.\n# Examples of good lines:\n# 
min_doc_freq:2\n# max_doc_freq:20000\n# max_query_terms:25\n# 
min_term_freq:2\n# minimum_should_match:30%\n# min_word_len:2\n# 
max_word_len:40\n# fields:text,opening_text\n# </pre> <!-- leave this line 
exactly as it is -->",
        "cirrussearch-didyoumean-settings": "  #<!-- leave this line exactly as 
it is --> <pre>\n# This message lets you configure the settings of the \"Did 
you mean\" suggestions.\n# See also 
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html\n#
 Changes to this take effect immediately.\n# The syntax is as follows:\n#   * 
Everything from a \"#\" character to the end of the line is a comment.\n#   * 
Every non-blank line is the setting name followed by a \":\" character followed 
by the setting value\n# The settings are :\n#   * max_errors (integer): the 
maximum number of terms that will be considered misspelled in order to be 
corrected. 1 or 2.\n#   * confidence (float): The confidence level defines a 
factor applied to the input phrases score which is used as a threshold for 
other suggestion candidates. Only candidates that score higher than the 
threshold will be included in the result. For instance a confidence level of 
1.0 will only return suggestions that score higher than the input phrase. If 
set to 0.0 the best candidate are returned.\n#   * min_doc_freq (float 0 to 1): 
The minimal threshold in number of documents a suggestion should appear in.\n#  
                 High frequency terms are generally \"stop words\".\n#   * 
max_term_freq (float 0 to 1): The maximum threshold in number of documents in 
which a term can exist in order to be included.\n#   * prefix_length (integer): 
The minimal number of prefix characters that must match a term in order to be a 
suggestion.\n#   * suggest_mode (missing, popular, always): The suggest mode 
controls the way suggestions are included.\n# Examples of good lines:\n# 
max_errors:2\n# confidence:2.0\n# max_term_freq:0.5\n# min_doc_freq:0.01\n# 
prefix_length:2\n# suggest_mode:always\n#\n# </pre> <!-- leave this line 
exactly as it is -->",
        "cirrussearch-query-too-long": "Search request is longer than the 
maximum allowed length. ($1 > $2)",
        "cirrussearch-completionsuggester-pref": "Completion suggester",
diff --git a/includes/Hooks.php b/includes/Hooks.php
index b7cda74..ad60b0a 100644
--- a/includes/Hooks.php
+++ b/includes/Hooks.php
@@ -204,7 +204,6 @@
         */
        private static function overrideMoreLikeThisOptionsFromMessage() {
                global $wgCirrusSearchMoreLikeThisConfig,
-                       $wgCirrusSearchMoreLikeThisUseFields,
                        $wgCirrusSearchMoreLikeThisAllowedFields,
                        $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit,
                        $wgCirrusSearchMoreLikeThisFields;
@@ -261,13 +260,6 @@
                                        array_map( 'trim', explode( ',', $v ) ),
                                        
$wgCirrusSearchMoreLikeThisAllowedFields );
                                break;
-                       case 'use_fields':
-                               if ( $v === 'true' ) {
-                                       $wgCirrusSearchMoreLikeThisUseFields = 
true;
-                               } elseif ( $v === 'false' ) {
-                                       $wgCirrusSearchMoreLikeThisUseFields = 
false;
-                               }
-                               break;
                        }
                        if ( 
$wgCirrusSearchMoreLikeThisConfig['max_query_terms'] > 
$wgCirrusSearchMoreLikeThisMaxQueryTermsLimit ) {
                                
$wgCirrusSearchMoreLikeThisConfig['max_query_terms'] = 
$wgCirrusSearchMoreLikeThisMaxQueryTermsLimit;
@@ -303,7 +295,6 @@
         */
        private static function overrideMoreLikeThisOptions( WebRequest 
$request ) {
                global $wgCirrusSearchMoreLikeThisConfig,
-                       $wgCirrusSearchMoreLikeThisUseFields,
                        $wgCirrusSearchMoreLikeThisAllowedFields,
                        $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit,
                        $wgCirrusSearchMoreLikeThisFields;
@@ -316,7 +307,6 @@
                self::overrideMinimumShouldMatch( 
$wgCirrusSearchMoreLikeThisConfig['minimum_should_match'], $request, 
'cirrusMltMinimumShouldMatch' );
                self::overrideNumeric( 
$wgCirrusSearchMoreLikeThisConfig['min_word_len'], $request, 
'cirrusMltMinWordLength' );
                self::overrideNumeric( 
$wgCirrusSearchMoreLikeThisConfig['max_word_len'], $request, 
'cirrusMltMaxWordLength' );
-               self::overrideYesNo( $wgCirrusSearchMoreLikeThisUseFields, 
$request, 'cirrusMltUseFields' );
                $fields = $request->getVal( 'cirrusMltFields' );
                if( isset( $fields ) ) {
                        $wgCirrusSearchMoreLikeThisFields = array_intersect(
diff --git a/includes/Query/MoreLikeFeature.php 
b/includes/Query/MoreLikeFeature.php
index 08b2b41..01837b7 100644
--- a/includes/Query/MoreLikeFeature.php
+++ b/includes/Query/MoreLikeFeature.php
@@ -20,19 +20,10 @@
        private $config;
 
        /**
-        * @var callable Callable for fetching page from elasticsearch. See
-        *  Searcher::get.
-        */
-       private $getCallable;
-
-       /**
         * @param SearchConfig $config
-        * @param callable $getCallable Callable for fetching page from
-        *  elasticsearch. See Searcher::get.
         */
-       public function __construct( SearchConfig $config, $getCallable ) {
+       public function __construct( SearchConfig $config ) {
                $this->config = $config;
-               $this->getCallable = $getCallable;
        }
 
        /**
@@ -192,36 +183,10 @@
                }
 
                $moreLikeThisFields = $this->config->get( 
'CirrusSearchMoreLikeThisFields' );
-               $moreLikeThisUseFields = $this->config->get( 
'CirrusSearchMoreLikeThisUseFields' );
                sort( $moreLikeThisFields );
                $query = new \Elastica\Query\MoreLikeThis();
                $query->setParams( $this->config->get( 
'CirrusSearchMoreLikeThisConfig' ) );
                $query->setFields( $moreLikeThisFields );
-
-               // The 'all' field cannot be retrieved from _source
-               // We have to extract the text content before.
-               if ( in_array( 'all', $moreLikeThisFields ) ) {
-                       $moreLikeThisUseFields = false;
-               }
-
-               if ( !$moreLikeThisUseFields && $moreLikeThisFields !== 
['text'] ) {
-                       // Run a first pass to extract the text field content 
because we
-                       // want to compare it against other fields.
-                       $text = [];
-                       $found = call_user_func( $this->getCallable, $docIds, 
['text'] );
-                       if ( !$found->isOK() ) {
-                               return null;
-                       }
-                       $found = $found->getValue();
-                       if ( !count( $found ) ) {
-                               return null;
-                       }
-                       foreach ( $found as $foundArticle ) {
-                               $text[] = $foundArticle->text;
-                       }
-                       sort( $text, SORT_STRING );
-                       $likeDocs = array_merge( $likeDocs, $text );
-               }
 
                /** @suppress PhanTypeMismatchArgument library is mis-annotated 
*/
                $query->setLike( $likeDocs );
diff --git a/includes/Searcher.php b/includes/Searcher.php
index c44b4dd..9044304 100644
--- a/includes/Searcher.php
+++ b/includes/Searcher.php
@@ -299,7 +299,7 @@
                                // Handle morelike keyword (greedy). This needs 
to be the
                                // very first item until combining with other 
queries
                                // is worked out.
-                               new Query\MoreLikeFeature( $this->config, 
[$this, 'get'] ),
+                               new Query\MoreLikeFeature( $this->config ),
                                // Handle title prefix notation (greedy)
                                new Query\PrefixFeature(),
                                // Handle prefer-recent keyword
diff --git a/tests/browser/features/more_like_this_options.feature 
b/tests/browser/features/more_like_this_options.feature
index f41ab68..d957684 100644
--- a/tests/browser/features/more_like_this_options.feature
+++ b/tests/browser/features/more_like_this_options.feature
@@ -31,11 +31,3 @@
   Scenario: Searching for morelike:<page> with the title field and settings 
with poor precision
     When I set More Like This Options to title field, word length to 2 and I 
search for morelike:More Like Me 1
     Then ChangeMe is in the search results
-
-  Scenario: Searching for morelike:<page> with the all field works even if 
cirrusMtlUseFields is set to yes
-    When I set More Like This Options to all field, word length to 4 and I 
search for morelike:More Like Me 1
-    Then More Like Me 2 is in the search results
-      And More Like Me 3 is in the search results
-      And More Like Me 4 is in the search results
-      And More Like Me 5 is in the search results
-      But ChangeMe is not in the search results
diff --git a/tests/unit/Query/MoreLikeFeatureTest.php 
b/tests/unit/Query/MoreLikeFeatureTest.php
index 04ccee0..da9333c 100644
--- a/tests/unit/Query/MoreLikeFeatureTest.php
+++ b/tests/unit/Query/MoreLikeFeatureTest.php
@@ -128,14 +128,8 @@
 
                $context = new SearchContext( $config );
 
-               // This is only used for the 'all' feature which is currently
-               // untested, and is planned to be removed.
-               $getCallback = function ( array $docIds, array $fields ) {
-                       throw new \RuntimeException( 'No requests should be 
made to elasticsearch' );
-               };
-
                // Finally run the test
-               $feature = new MoreLikeFeature( $config, $getCallback );
+               $feature = new MoreLikeFeature( $config );
 
                $result = $feature->apply( $context, $term );
 

-- 
To view, visit https://gerrit.wikimedia.org/r/323860
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I055816d8f8ff115fe499dbce64fd38a0e70a4ee5
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: DCausse <dcau...@wikimedia.org>
Gerrit-Reviewer: Cindy-the-browser-test-bot <bernhardsone...@gmail.com>
Gerrit-Reviewer: EBernhardson <ebernhard...@wikimedia.org>
Gerrit-Reviewer: Gehel <gleder...@wikimedia.org>
Gerrit-Reviewer: Manybubbles <never...@wikimedia.org>
Gerrit-Reviewer: Siebrand <siebr...@kitano.nl>
Gerrit-Reviewer: Smalyshev <smalys...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to