Smalyshev has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/326037 )
Change subject: [WIP][DNM] Allow extensions to hook features
......................................................................
[WIP][DNM] Allow extensions to hook features
Also move GeoFeature to GeoData extension.
Change-Id: Id08efd46337a977639ebf3724ee3492512f326ac
---
M README
M autoload.php
A docs/hooks.txt
D includes/Query/GeoFeature.php
M includes/Search/RescoreBuilders.php
M includes/Search/SearchContext.php
M includes/Searcher.php
M profiles/RescoreProfiles.config.php
D tests/unit/Query/GeoFeatureTest.php
9 files changed, 166 insertions(+), 638 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch
refs/changes/37/326037/1
diff --git a/README b/README
index 23f5f92..f76813f 100644
--- a/README
+++ b/README
@@ -260,27 +260,7 @@
Hooks
-----
-CirrusSearch provides hooks that other extensions can make use of to extend
the core schema and
-modify documents.
-
-There are currently two phases to building cirrus documents: the parse phase
and the links phase.
-The parse phase then the links phase is run when the article's rendered text
would change (actual
-article change and template change). Only the links phase is run when an
article is newly links
-or unlinked.
-
-Note that this whole thing is a somewhat experimental feature at this point
and the API hasn't
-really been settled.
-
-'CirrusSearchAnalysisConfig': Allows to hook into the configuration for
analysis
- &config - multi-dimensional configuration array for analysis of various
languages and fields
- $builder - instance of MappingConfigBuilder, for easier use of utility
methods to build fields
-
-'CirrusSearchMappingConfig': Allows configuration of the mapping of fields
- &config - multi-dimensional configuration array that contains Elasticsearch
document configuration.
- The 'page' index contains configuration for Elasticsearch documents
representing pages.
- The 'namespace' index contains namespace configuration for Elasticsearch
documents representing
- namespaces.
-
+See docs/hooks.txt.
Validating a new version of Elasticsearch
-----------------------------------------
diff --git a/autoload.php b/autoload.php
index 92f7d56..8baa1db 100644
--- a/autoload.php
+++ b/autoload.php
@@ -112,7 +112,6 @@
'CirrusSearch\\Query\\FullTextQueryBuilder' => __DIR__ .
'/includes/Query/FullTextQueryBuilder.php',
'CirrusSearch\\Query\\FullTextQueryStringQueryBuilder' => __DIR__ .
'/includes/Query/FullTextQueryStringQueryBuilder.php',
'CirrusSearch\\Query\\FullTextSimpleMatchQueryBuilder' => __DIR__ .
'/includes/Query/FullTextSimpleMatchQueryBuilder.php',
- 'CirrusSearch\\Query\\GeoFeature' => __DIR__ .
'/includes/Query/GeoFeature.php',
'CirrusSearch\\Query\\HasTemplateFeature' => __DIR__ .
'/includes/Query/HasTemplateFeature.php',
'CirrusSearch\\Query\\InCategoryFeature' => __DIR__ .
'/includes/Query/InCategoryFeature.php',
'CirrusSearch\\Query\\InTitleFeature' => __DIR__ .
'/includes/Query/InTitleFeature.php',
diff --git a/docs/hooks.txt b/docs/hooks.txt
new file mode 100644
index 0000000..c2e32ba
--- /dev/null
+++ b/docs/hooks.txt
@@ -0,0 +1,47 @@
+CirrusSearch provides hooks that other extensions can make use of to extend
the core schema and
+modify documents.
+
+There are currently two phases to building cirrus documents: the parse phase
and the links phase.
+The parse phase then the links phase is run when the article's rendered text
would change (actual
+article change and template change). Only the links phase is run when an
article is newly links
+or unlinked.
+
+Note that this whole thing is a somewhat experimental feature at this point
and the API hasn't
+really been settled.
+
+'CirrusSearchAnalysisConfig': Allows to hook into the configuration for
analysis
+ &$config - multi-dimensional configuration array for analysis of various
languages and fields
+
+'CirrusSearchMappingConfig': Allows configuration of the mapping of fields
+ &$config - multi-dimensional configuration array that contains Elasticsearch
document configuration.
+ The 'page' index contains configuration for Elasticsearch documents
representing pages.
+ The 'namespace' index contains namespace configuration for Elasticsearch
documents representing
+ namespaces.
+ $builder - instance of MappingConfigBuilder, for easier use of utility
methods to build fields.
+
+'CirrusSearchBuildDocumentParse': Allows extensions to modify ElasticSearch
document produced from a page
+ $doc - \Elastica\Document object representing the page. Extensions can modify
it.
+ $title - Title object representing the page.
+ $content - Content object for the page.
+ $parserOutput - ParserOutput for the page, if exists, or null.
+
+'CirrusSearchBuildDocumentLinks': Allows extensions to process incoming and
outgoing links for the document.
+ $doc - \Elastica\Document object representing the page. Extensions can add
links to it.
+ $title - Title object representing the page.
+ $connection - \CirrusSearch\Connection object representing connection to
ElasticSearch server.
+
+'CirrusSearchBuildDocumentFinishBatch': Called when batch of pages has been
indexed.
+ $pages - list of WikiPage objects which have been indexed.
+
+'CirrusSearchAddQueryFeatures': Allows extensions to add query parser features
+ &config - SearchConfig object which holds current search configuration
+ &$$extraFeatures - array holding feature objects. This is where the extension
should add its features.
+ The feature class should implement \CirrusSearch\Query\KeywordFeature.
+
+'CirrusSearchScoreBuilder': Allows extensions to define rescore builder
functions
+ $func - function definition map, with values:
+ type - function name
+ For other parameter examples, see RescoreProfiles.config.php
+ $context - SearchContext object
+ $weight - score weight
+ &$builder - object implementing the function. Should be
\ No newline at end of file
diff --git a/includes/Query/GeoFeature.php b/includes/Query/GeoFeature.php
deleted file mode 100644
index 33e1b73..0000000
--- a/includes/Query/GeoFeature.php
+++ /dev/null
@@ -1,219 +0,0 @@
-<?php
-
-namespace CirrusSearch\Query;
-
-use CirrusSearch\Search\SearchContext;
-use CirrusSearch\SearchConfig;
-use Elastica\Query\AbstractQuery;
-use GeoData\GeoData;
-use GeoData\Coord;
-use GeoData\Globe;
-use Title;
-
-/**
- * Applies geo based features to the query.
- *
- * Two forms of geo based querying are provided: a filter that limits search
- * results to a geographic area and a boost that increases the score of
- * results within the geographic area. Supports specifying geo coordinates
- * either by providing a latitude and longitude, or a page title to source the
- * latitude and longitude from. All values can be prefixed with a radius in m
- * or km to apply. If not specified this defaults to 5km.
- *
- * Examples:
- * neartitle:Shanghai
- * neartitle:50km,Seoul
- * nearcoord:1.2345,-5.4321
- * nearcoord:17km,54.321,-12.345
- * boost-neartitle:"San Francisco"
- * boost-neartitle:50km,Kampala
- * boost-nearcoord:-12.345,87.654
- * boost-nearcoord:77km,34.567,76.543
- */
-class GeoFeature extends SimpleKeywordFeature {
- // Default radius, in meters
- const DEFAULT_RADIUS = 5000;
- // Default globe
- const DEFAULT_GLOBE = 'earth';
-
- /**
- * @return string[]
- */
- protected function getKeywords() {
- return ['boost-nearcoord', 'boost-neartitle', 'nearcoord',
'neartitle'];
- }
-
- /**
- * @param SearchContext $context
- * @param string $key The keyword
- * @param string $value The value attached to the keyword with quotes
stripped
- * @param string $quotedValue The original value in the search string,
including quotes if used
- * @param bool $negated Is the search negated? Not used to generate the
returned AbstractQuery,
- * that will be negated as necessary. Used for any other
building/context necessary.
- * @return array Two element array, first an AbstractQuery or null to
apply to the
- * query. Second a boolean indicating if the quotedValue should be
kept in the search
- * string.
- */
- protected function doApply( SearchContext $context, $key, $value,
$quotedValue, $negated ) {
- if ( !class_exists( GeoData::class ) ) {
- return [ null, false ];
- }
-
- if ( substr( $key, -5 ) === 'title' ) {
- list( $coord, $radius, $excludeDocId ) =
$this->parseGeoNearbyTitle(
- $context->getConfig(),
- $value
- );
- } else {
- list( $coord, $radius ) = $this->parseGeoNearby( $value
);
- $excludeDocId = '';
- }
-
- $filter = null;
- if ( $coord ) {
- if ( substr( $key, 0, 6 ) === 'boost-' ) {
- $context->addGeoBoost( $coord, $radius,
$negated ? 0.1 : 1 );
- } else {
- $filter = self::createQuery( $coord, $radius,
$excludeDocId );
- }
- }
-
- return [ $filter, false ];
- }
-
- /**
- * radius, if provided, must have either m or km suffix. Valid formats:
- * <title>
- * <radius>,<title>
- *
- * @param SearchConfig $config the Cirrus config object
- * @param string $text user input to parse
- * @return array Three member array with Coordinate object, integer
radius
- * in meters, and page id to exclude from results.. When invalid the
- * Coordinate returned will be null.
- */
- public function parseGeoNearbyTitle( SearchConfig $config, $text ) {
- $title = Title::newFromText( $text );
- if ( $title && $title->exists() ) {
- // Default radius if not provided: 5km
- $radius = self::DEFAULT_RADIUS;
- } else {
- // If the provided value is not a title try to extract
a radius prefix
- // from the beginning. If $text has a valid radius
prefix see if the
- // remaining text is a valid title to use.
- $pieces = explode( ',', $text, 2 );
- if ( count( $pieces ) !== 2 ) {
- return [ null, 0, '' ];
- }
- $radius = $this->parseDistance( $pieces[0] );
- if ( $radius === null ) {
- return [ null, 0, '' ];
- }
- $title = Title::newFromText( $pieces[1] );
- if ( !$title || !$title->exists() ) {
- return [ null, 0, '' ];
- }
- }
-
- $coord = GeoData::getPageCoordinates( $title );
- if ( !$coord ) {
- return [ null, 0, '' ];
- }
-
- return [ $coord, $radius, $config->makeId(
$title->getArticleID() ) ];
- }
-
- /**
- * radius, if provided, must have either m or km suffix. Latitude and
longitude
- * must be floats in the domain of [-90:90] for latitude and [-180,180]
for
- * longitude. Valid formats:
- * <lat>,<lon>
- * <radius>,<lat>,<lon>
- *
- * @param string $text
- * @return array Two member array with Coordinate object, and integer
radius
- * in meters. When invalid the Coordinate returned will be null.
- */
- public function parseGeoNearby( $text ) {
- $pieces = explode( ',', $text, 3 );
- // Default radius if not provided: 5km
- $radius = self::DEFAULT_RADIUS;
- if ( count( $pieces ) === 3 ) {
- $radius = $this->parseDistance( $pieces[0] );
- if ( $radius === null ) {
- return [ null, 0 ];
- }
- $lat = $pieces[1];
- $lon = $pieces[2];
- } elseif ( count( $pieces ) === 2 ) {
- $lat = $pieces[0];
- $lon = $pieces[1];
- } else {
- return [ null, 0 ];
- }
-
- $globe = new Globe( self::DEFAULT_GLOBE );
- if ( !$globe->coordinatesAreValid( $lat, $lon ) ) {
- return [ null, 0 ];
- }
-
- return [
- new Coord( floatval( $lat ), floatval( $lon ),
$globe->getName() ),
- $radius,
- ];
- }
-
- /**
- * @param string $distance
- * @return int|null Parsed distance in meters, or null if unparsable
- */
- public function parseDistance( $distance ) {
- if ( !preg_match( '/^(\d+)(m|km|mi|ft|yd)$/', $distance,
$matches ) ) {
- return null;
- }
-
- $scale = [
- 'm' => 1,
- 'km' => 1000,
- // Supported non-SI units, and their conversions,
sourced from
- //
https://en.wikipedia.org/wiki/Unit_of_length#Imperial.2FUS
- 'mi' => 1609.344,
- 'ft' => 0.3048,
- 'yd' => 0.9144,
- ];
-
- return max( 10, (int) round( $matches[1] * $scale[$matches[2]]
) );
- }
-
- /**
- * Create a filter for near: and neartitle: queries.
- *
- * @param Coord $coord
- * @param int $radius Search radius in meters
- * @param string $docIdToExclude Document id to exclude, or "" for no
exclusions.
- * @return AbstractQuery
- */
- public static function createQuery( Coord $coord, $radius,
$docIdToExclude = '' ) {
- $query = new \Elastica\Query\BoolQuery();
- $query->addFilter( new \Elastica\Query\Term( [
'coordinates.globe' => $coord->globe ] ) );
- $query->addFilter( new \Elastica\Query\Term( [
'coordinates.primary' => 1 ] ) );
-
- $distanceFilter = new \Elastica\Query\GeoDistance(
- 'coordinates.coord',
- [ 'lat' => $coord->lat, 'lon' => $coord->lon ],
- $radius . 'm'
- );
- $distanceFilter->setOptimizeBbox( 'indexed' );
- $query->addFilter( $distanceFilter );
-
- if ( $docIdToExclude !== '' ) {
- $query->addMustNot( new \Elastica\Query\Term( [ '_id'
=> $docIdToExclude ] ) );
- }
-
- $nested = new \Elastica\Query\Nested();
- $nested->setPath( 'coordinates' )->setQuery( $query );
-
- return $nested;
- }
-
-}
diff --git a/includes/Search/RescoreBuilders.php
b/includes/Search/RescoreBuilders.php
index caf2d50..9e59ab7 100644
--- a/includes/Search/RescoreBuilders.php
+++ b/includes/Search/RescoreBuilders.php
@@ -2,10 +2,10 @@
namespace CirrusSearch\Search;
-use CirrusSearch\Query\GeoFeature;
use CirrusSearch\Util;
use Elastica\Query\FunctionScore;
use Elastica\Query\AbstractQuery;
+use Hooks;
use MWNamespace;
/**
@@ -97,6 +97,7 @@
*
* @param array $rescoreDef
* @return FunctionScore|null the rescore query
+ * @throws InvalidRescoreProfileException
*/
private function buildRescoreQuery( array $rescoreDef ) {
switch( $rescoreDef['type'] ) {
@@ -127,6 +128,7 @@
*
* @param array $profile
* @return array the supported rescore profile.
+ * @throws InvalidRescoreProfileException
*/
private function getSupportedProfile( array $profile ) {
if ( !is_array( $profile['supported_namespaces'] ) ) {
@@ -167,6 +169,7 @@
/**
* @param string $profileName the profile to load
* @return array the rescore profile identified by $profileName
+ * @throws InvalidRescoreProfileException
*/
private function getFallbackProfile( $profileName ) {
$profile = $this->context->getConfig()->getElement(
'CirrusSearchRescoreProfiles', $profileName );
@@ -221,8 +224,9 @@
* Builds a new function score chain.
*
* @param SearchContext $context
- * @param string $chainName the name of the chain (must be a valid
+ * @param string $chainName the name of the chain (must be a
valid
* chain in wgCirrusSearchRescoreFunctionScoreChains)
+ * @throws InvalidRescoreProfileException
*/
public function __construct( SearchContext $context, $chainName ) {
$this->chainName = $chainName;
@@ -242,6 +246,7 @@
/**
* @return FunctionScore|null the rescore query or null none of
functions were
* needed.
+ * @throws InvalidRescoreProfileException
*/
public function buildRescoreQuery() {
if ( !isset( $this->chain['functions'] ) ) {
@@ -250,6 +255,12 @@
foreach( $this->chain['functions'] as $func ) {
$impl = $this->getImplementation( $func );
$impl->append( $this->functionScore );
+ }
+ // Add extensions
+ if ( !empty( $this->chain['add_extensions'] ) ) {
+ foreach ( $this->context->getExtraScoreBuilders() as
$extBuilder ) {
+ $extBuilder->append( $this->functionScore );
+ }
}
if ( !$this->functionScore->isEmptyFunction() ) {
return $this->functionScore;
@@ -260,6 +271,7 @@
/**
* @param array $func
* @return FunctionScoreBuilder
+ * @throws InvalidRescoreProfileException
*/
private function getImplementation( $func ) {
$weight = isset ( $func['weight'] ) ? $func['weight'] : 1;
@@ -286,10 +298,13 @@
return new LogMultFunctionScoreBuilder( $this->context,
$weight, $func['params'] );
case 'geomean':
return new GeoMeanFunctionScoreBuilder( $this->context,
$weight, $func['params'] );
- case 'georadius':
- return new GeoRadiusFunctionScoreBuilder(
$this->context, $weight );
default:
- throw new InvalidRescoreProfileException( "Unknown
function score type {$func['type']}." );
+ $builder = null;
+ Hooks::run( 'CirrusSearchScoreBuilder', [ $func,
$this->context, &$builder ] );
+ if ( !$builder ) {
+ throw new InvalidRescoreProfileException(
"Unknown function score type {$func['type']}." );
+ }
+ return $builder;
}
}
}
@@ -635,8 +650,9 @@
/**
* @param SearchContext $context
- * @param float $weight
- * @param array $profile
+ * @param float $weight
+ * @param array $profile
+ * @throws InvalidRescoreProfileException
*/
public function __construct( SearchContext $context, $weight, $profile
) {
parent::__construct( $context, $weight );
@@ -667,6 +683,7 @@
* @param float $M
* @param float $N
* @return float
+ * @throws InvalidRescoreProfileException
*/
private function findCenterFactor( $M, $N ) {
// Neutral point is found by resolving
@@ -719,8 +736,9 @@
/**
* @param SearchContext $context
- * @param float $weight
- * @param array $profile
+ * @param float $weight
+ * @param array $profile
+ * @throws InvalidRescoreProfileException
*/
public function __construct( SearchContext $context, $weight, $profile
) {
parent::__construct( $context, $weight );
@@ -779,8 +797,9 @@
/**
* @param SearchContext $context
- * @param float $weight
- * @param array $profile
+ * @param float $weight
+ * @param array $profile
+ * @throws InvalidRescoreProfileException
*/
public function __construct( SearchContext $context, $weight, $profile
) {
parent::__construct( $context, $weight );
@@ -834,8 +853,9 @@
/**
* @param SearchContext $context
- * @param float $weight
- * @param array $profile
+ * @param float $weight
+ * @param array $profile
+ * @throws InvalidRescoreProfileException
*/
public function __construct( SearchContext $context, $weight, $profile
) {
parent::__construct( $context, $weight );
@@ -951,22 +971,6 @@
}
$functionScore->addScriptScoreFunction( new
\Elastica\Script\Script( $exponentialDecayExpression,
$parameters, 'expression' ), null, $this->weight );
- }
-}
-
-/**
- * Builds a boost for documents based on geocoordinates.
- * Reads its params from SearchContext::geoBoost. Initialized
- * by special syntax in user query.
- */
-class GeoRadiusFunctionScoreBuilder extends FunctionScoreBuilder {
- public function append( FunctionScore $functionScore ) {
- foreach ( $this->context->getGeoBoosts() as $config ) {
- $functionScore->addWeightFunction(
- $this->weight * $config['weight'],
- GeoFeature::createQuery( $config['coord'],
$config['radius'] )
- );
- }
}
}
diff --git a/includes/Search/SearchContext.php
b/includes/Search/SearchContext.php
index 68cf13d..7cb6e5a 100644
--- a/includes/Search/SearchContext.php
+++ b/includes/Search/SearchContext.php
@@ -69,10 +69,9 @@
private $rescoreProfile;
/**
- * @var array[] nested array of arrays. Each child array contains three
keys:
- * coord, radius and weight. Used for geographic radius boosting.
+ * @var FunctionScoreBuilder[] Extra scoring builders to use.
*/
- private $geoBoosts = [];
+ private $extraScoreBuilders = [];
/**
* @var bool Could this query possibly return results?
@@ -184,6 +183,7 @@
'near_match' => 10,
'prefix' => 2,
];
+
/**
* @param SearchConfig $config
* @param int[]|null $namespaces
@@ -316,31 +316,10 @@
}
/**
- * @param string the rescore profile to use
+ * @param string $rescoreProfile the rescore profile to use
*/
public function setRescoreProfile( $rescoreProfile ) {
$this->rescoreProfile = $rescoreProfile;
- }
-
- /**
- * @return array[] nested array of arrays. Each child array contains
three keys:
- * coord, radius and weight
- */
- public function getGeoBoosts() {
- return $this->geoBoosts;
- }
-
- /**
- * @param Coord $coord Coordinates to boost near
- * @param int $radius radius to boost within, in meters
- * @param float $weight Number to multiply score by when within radius
- */
- public function addGeoBoost( Coord $coord, $radius, $weight ) {
- $this->geoBoosts[] = [
- 'coord' => $coord,
- 'radius' => $radius,
- 'weight' => $weight,
- ];
}
/**
@@ -474,7 +453,7 @@
}
/**
- * @param AbstractQuery Query that should be used for highlighting if
different
+ * @param AbstractQuery $query Query that should be used for
highlighting if different
* from the query used for selecting.
*/
public function setHighlightQuery( AbstractQuery $query ) {
@@ -491,6 +470,7 @@
}
/**
+ * @param ResultsType $resultsType
* @return array|null Highlight portion of query to be sent to
elasticsearch
*/
public function getHighlight( ResultsType $resultsType ) {
@@ -710,7 +690,8 @@
}
/**
- * @param string set the original search term
+ * Set the original search term
+ * @param string $term
*/
public function setOriginalSearchTerm( $term ) {
$this->originalSearchTerm = $term;
@@ -722,4 +703,21 @@
public function escaper() {
return $this->escaper;
}
+
+ /**
+ * @return FunctionScoreBuilder[]
+ */
+ public function getExtraScoreBuilders() {
+ return $this->extraScoreBuilders;
+ }
+
+ /**
+ * Add custom scoring function to the context.
+ * The rescore builder will pick it up.
+ * @param FunctionScoreBuilder $rescore
+ */
+ public function addCustomRescoreComponent( FunctionScoreBuilder
$rescore ) {
+ $this->extraScoreBuilders[] = $rescore;
+ }
+
}
diff --git a/includes/Searcher.php b/includes/Searcher.php
index 9aafbc8..92fc301 100644
--- a/includes/Searcher.php
+++ b/includes/Searcher.php
@@ -2,12 +2,14 @@
namespace CirrusSearch;
+use CirrusSearch\Query\KeywordFeature;
use CirrusSearch\Search\FullTextResultsType;
use CirrusSearch\Search\ResultsType;
use CirrusSearch\Search\RescoreBuilder;
use CirrusSearch\Search\SearchContext;
use CirrusSearch\Query\FullTextQueryBuilder;
use CirrusSearch\Elastica\MultiSearch as MultiSearch;
+use Elastica\Exception\RuntimeException;
use Language;
use MediaWiki\Logger\LoggerFactory;
use MediaWiki\MediaWikiServices;
@@ -17,6 +19,8 @@
use ApiUsageException;
use UsageException;
use User;
+use Hooks;
+
/**
* Performs searches using Elasticsearch. Note that each instance of this
class
@@ -293,45 +297,63 @@
$builderProfile = $this->config->get(
'CirrusSearchFullTextQueryBuilderProfile' );
$builderSettings = $this->config->getElement(
'CirrusSearchFullTextQueryBuilderProfiles', $builderProfile );
+ $features = [
+ // Handle morelike keyword (greedy). This needs to be
the
+ // very first item until combining with other queries
+ // is worked out.
+ new Query\MoreLikeFeature( $this->config ),
+ // Handle title prefix notation (greedy)
+ new Query\PrefixFeature(),
+ // Handle prefer-recent keyword
+ new Query\PreferRecentFeature( $this->config ),
+ // Handle local keyword
+ new Query\LocalFeature(),
+ // Handle insource keyword using regex
+ new Query\RegexInSourceFeature( $this->config ),
+ // Handle boost-templates keyword
+ new Query\BoostTemplatesFeature(),
+ // Handle hastemplate keyword
+ new Query\HasTemplateFeature(),
+ // Handle linksto keyword
+ new Query\LinksToFeature(),
+ // Handle incategory keyword
+ new Query\InCategoryFeature( $this->config ),
+ // Handle non-regex insource keyword
+ new Query\SimpleInSourceFeature(),
+ // Handle intitle keyword
+ new Query\InTitleFeature(),
+ // inlanguage keyword
+ new Query\LanguageFeature(),
+ // File types
+ new Query\FileTypeFeature(),
+ // File numeric characteristics - size, resolution, etc.
+ new Query\FileNumericFeature(),
+ ];
+
+ $extraFeatures = [];
+ Hooks::run( 'CirrusSearchAddQueryFeatures', [ $this->config,
&$extraFeatures ] );
+ foreach ( $extraFeatures as $extra ) {
+ if ( $extra instanceof KeywordFeature ) {
+ $features[] = $extra;
+ } else {
+ LoggerFactory::getInstance( 'CirrusSearch' )
+ ->warning( 'Skipped invalid feature of
class ' . get_class( $extra ) );
+ }
+ }
+
+ /** @var FullTextQueryBuilder $qb */
$qb = new $builderSettings['builder_class'](
$this->config,
- [
- // Handle morelike keyword (greedy). This needs
to be the
- // very first item until combining with other
queries
- // is worked out.
- new Query\MoreLikeFeature( $this->config ),
- // Handle title prefix notation (greedy)
- new Query\PrefixFeature(),
- // Handle prefer-recent keyword
- new Query\PreferRecentFeature( $this->config ),
- // Handle local keyword
- new Query\LocalFeature(),
- // Handle insource keyword using regex
- new Query\RegexInSourceFeature( $this->config ),
- // Handle neartitle, nearcoord keywords, and
their boosted alternates
- new Query\GeoFeature(),
- // Handle boost-templates keyword
- new Query\BoostTemplatesFeature(),
- // Handle hastemplate keyword
- new Query\HasTemplateFeature(),
- // Handle linksto keyword
- new Query\LinksToFeature(),
- // Handle incategory keyword
- new Query\InCategoryFeature( $this->config ),
- // Handle non-regex insource keyword
- new Query\SimpleInSourceFeature(),
- // Handle intitle keyword
- new Query\InTitleFeature(),
- // inlanguage keyword
- new Query\LanguageFeature(),
- // File types
- new Query\FileTypeFeature(),
- // File numeric characteristics - size,
resolution, etc.
- new Query\FileNumericFeature(),
- ],
+ $features,
$builderSettings['settings']
);
+
+
+ if ( !( $qb instanceof FullTextQueryBuilder ) ) {
+ throw new RuntimeException( "Bad builder class
configured: {$builderSettings['builder_class']}" );
+ }
+
$showSuggestion = $showSuggestion && $this->offset == 0
&& $this->config->get(
'CirrusSearchEnablePhraseSuggest' );
$qb->build( $this->searchContext, $term, $showSuggestion );
diff --git a/profiles/RescoreProfiles.config.php
b/profiles/RescoreProfiles.config.php
index 4725b67..3ffc62e 100644
--- a/profiles/RescoreProfiles.config.php
+++ b/profiles/RescoreProfiles.config.php
@@ -185,15 +185,8 @@
// Scores documents according to their language,
// See $wgCirrusSearchLanguageWeight
[ 'type' => 'language' ],
-
- // Boosts documents in a particular geographic area.
- // Triggered by query syntax.
- [ 'type' => 'georadius', 'weight' => [
- 'value' => 2,
- 'config_override' =>
'CirrusSearchPreferGeoRadiusWeight',
- 'uri_param_override' =>
'cirrusPreferGeoRadiusWeight',
- ] ],
- ]
+ ],
+ 'add_extensions' => true
],
// Chain with optional functions if classic_allinone_chain
// or optional_chain is omitted from the rescore profile then some
@@ -204,12 +197,8 @@
[ 'type' => 'templates' ],
[ 'type' => 'namespaces' ],
[ 'type' => 'language' ],
- [ 'type' => 'georadius', 'weight' => [
- 'value' => 2,
- 'config_override' =>
'CirrusSearchPreferGeoRadiusWeight',
- 'uri_param_override' =>
'cirrusPreferGeoRadiusWeight',
- ] ],
- ]
+ ],
+ 'add_extensions' => true
],
// Chain with boostlinks only
'boostlinks_only_chain' => [
diff --git a/tests/unit/Query/GeoFeatureTest.php
b/tests/unit/Query/GeoFeatureTest.php
deleted file mode 100644
index 126657c..0000000
--- a/tests/unit/Query/GeoFeatureTest.php
+++ /dev/null
@@ -1,292 +0,0 @@
-<?php
-
-namespace CirrusSearch\Query;
-
-use CirrusSearch\CirrusTestCase;
-use CirrusSearch\SearchConfig;
-use GeoData\Coord;
-use LoadBalancer;
-use IDatabase;
-use MediaWiki\MediaWikiServices;
-use Title;
-
-/**
- * Test GeoFeature functions.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
- * http://www.gnu.org/copyleft/gpl.html
- *
- * @group CirrusSearch
- */
-class GeoFeatureTest extends CirrusTestCase {
-
- public function parseDistanceProvider() {
- return [
- 'unknown units returns null' => [
- null,
- '100fur',
- ],
- 'gibberish returns null' => [
- null,
- 'gibberish',
- ],
- 'no space allowed between numbers and units' => [
- null,
- '100 m',
- ],
- 'meters' => [
- 100,
- '100m',
- ],
- 'kilometers' => [
- 1000,
- '1km',
- ],
- 'yards' => [
- 366,
- '400yd',
- ],
- 'one mile rounds down' => [
- 1609,
- '1mi',
- ],
- 'two miles rounds up' => [
- '3219',
- '2mi',
- ],
- '1000 feet rounds up' => [
- 305,
- '1000ft',
- ],
- '3000 feet rounds down' => [
- 914,
- '3000ft',
- ],
- 'small requests are bounded' => [
- 10,
- '1ft',
- ],
- 'allows large inputs' => [
- 4321000,
- '4321km',
- ],
- ];
- }
-
- /**
- * @dataProvider parseDistanceProvider
- */
- public function testParseDistance( $expected, $distance ) {
- if ( class_exists( Coord::class ) ) {
- $feature = new GeoFeature();
- $this->assertEquals( $expected,
$feature->parseDistance( $distance ) );
- } else {
- $this->markTestSkipped( 'GeoData extension must be
installed' );
- }
- }
-
- public function parseGeoNearbyProvider() {
- return [
- 'random input' => [
- [ null, 0 ],
- 'gibberish'
- ],
- 'random input with comma' => [
- [ null, 0 ],
- 'gibberish,42.42'
- ],
- 'random input with valid radius prefix' => [
- [ null, 0 ],
- '20km,42.42,invalid',
- ],
- 'valid coordinate, default radius' => [
- [
- [ 'lat' => 1.2345, 'lon' => 2.3456 ],
- 5000,
- ],
- '1.2345,2.3456',
- ],
- 'valid coordinate, specific radius in meters' => [
- [
- [ 'lat' => -5.4321, 'lon' => 42.345 ],
- 4321,
- ],
- '4321m,-5.4321,42.345',
- ],
- 'valid coordinate, specific radius in kilmeters' => [
- [
- [ 'lat' => 0, 'lon' => 42.345 ],
- 7000,
- ],
- '7km,0,42.345',
- ],
- 'out of bounds positive latitude' => [
- [ null, 0 ],
- '90.1,0'
- ],
- 'out of bounds negative latitude' => [
- [ null, 0 ],
- '-90.1,17',
- ],
- 'out of bounds positive longitude' => [
- [ null, 0 ],
- '49,180.1',
- ],
- 'out of bounds negative longitude' => [
- [ null, 0 ],
- '49,-180.001',
- ],
- 'valid coordinate with spaces' => [
- [
- [ 'lat' => 1.2345, 'lon' => 9.8765 ],
- 5000
- ],
- '1.2345, 9.8765'
- ],
- ];
- }
-
- /**
- * @dataProvider parseGeoNearbyProvider
- */
- public function testParseGeoNearby( $expected, $value ) {
- if ( class_exists( Coord::class ) ) {
- $feature = new GeoFeature;
- $result = $feature->parseGeoNearby( $value );
- if ( $result[0] instanceof Coord ) {
- $result[0] = [ 'lat' => $result[0]->lat, 'lon'
=> $result[0]->lon ];
- }
- $this->assertEquals( $expected, $result );
- } else {
- $this->markTestSkipped( 'GeoData extension must be
installed' );
- }
- }
-
- public function parseGeoNearbyTitleProvider() {
- return [
- 'basic page lookup' => [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 5000,
- 7654321,
- ],
- 'San Francisco'
- ],
- 'basic page lookup with radius in meters' => [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 1234,
- 7654321,
- ],
- '1234m,San Francisco'
- ],
- 'basic page lookup with radius in kilometers' => [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 2000,
- 7654321,
- ],
- '2km,San Francisco'
- ],
- 'basic page lookup with space between radius and name'
=> [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 2000,
- 7654321,
- ],
- '2km, San Francisco'
- ],
- 'page with comma in name' => [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 5000,
- 1234567,
- ],
- 'Washington, D.C.'
- ],
- 'page with comma in name and radius in kilometers' => [
- [
- [ 'lat' => 1.2345, 'lon' => 5.4321 ],
- 7000,
- 1234567,
- ],
- '7km,Washington, D.C.'
- ],
- 'unknown page lookup' => [
- [ null, 0, '' ],
- 'Unknown Title',
- ],
- 'unknown page lookup with radius' => [
- [ null, 0, '' ],
- '4km, Unknown Title',
- ],
- ];
- }
-
- /**
- * @dataProvider parseGeoNearbyTitleProvider
- */
- public function testParseGeoNearbyTitle( $expected, $value ) {
- if ( ! class_exists( Coord::class ) ) {
- $this->markTestSkipped( 'GeoData extension must be
installed' );
- return;
- }
-
- // Replace database with one that will return our fake
coordinates if asked
- $db = $this->getMock( IDatabase::class );
- $db->expects( $this->any() )
- ->method( 'select' )
- ->with( 'geo_tags', $this->anything(),
$this->anything(), $this->anything() )
- ->will( $this->returnValue( [
- (object) [ 'gt_lat' => 1.2345, 'gt_lon' =>
5.4321 ],
- ] ) );
- // Tell LinkCache all titles not explicitly added don't exist
- $db->expects( $this->any() )
- ->method( 'selectRow' )
- ->with( 'page', $this->anything(), $this->anything(),
$this->anything() )
- ->will( $this->returnValue( false ) );
- // Inject mock database into a mock LoadBalancer
- $lb = $this->getMockBuilder( LoadBalancer::class )
- ->disableOriginalConstructor()
- ->getMock();
- $lb->expects( $this->any() )
- ->method( 'getConnection' )
- ->will( $this->returnValue( $db ) );
- $this->setService( 'DBLoadBalancer', $lb );
-
- // Inject fake San Francisco page into LinkCache so it "exists"
- MediaWikiServices::getInstance()->getLinkCache()
- ->addGoodLinkObj( 7654321, Title::newFromText( 'San
Francisco' ) );
- // Inject fake page with comma in it as well
- MediaWikiServices::getInstance()->getLinkCache()
- ->addGoodLinkObj( 1234567, Title::newFromText(
'Washington, D.C.' ) );
-
- $config = $this->getMock( SearchConfig::class );
- $config->expects( $this->any() )
- ->method( 'makeId' )
- ->will( $this->returnCallback( function ( $id ) {
- return $id;
- } ) );
-
- // Finally run the test
- $feature = new GeoFeature;
- $result = $feature->parseGeoNearbyTitle( $config, $value );
- if ( $result[0] instanceof Coord ) {
- $result[0] = [ 'lat' => $result[0]->lat, 'lon' =>
$result[0]->lon ];
- }
-
- $this->assertEquals( $expected, $result );
- }
-}
--
To view, visit https://gerrit.wikimedia.org/r/326037
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id08efd46337a977639ebf3724ee3492512f326ac
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Smalyshev <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits