Manybubbles has uploaded a new change for review.
https://gerrit.wikimedia.org/r/95697
Change subject: Allow users prefer articles with recent changes
......................................................................
Allow users prefer articles with recent changes
This can be engaged by default on the entire wiki (like for wikinews)
or left off by default. When off by default it can be turned on by
prefixing the query with "prefer-recent:". This engages wikinews-like
behavior on any wiki. When on by default it can be turned off by prefixing
the query with "prefer-recent:0".
It can be further customized it by specifying the portion of the score
that decays with time like this: "prefer-recent:.1". Even more
customization can be had by specifying the half life of the decayed
portion of the score in days like this "prefer-recent:,150" or
"prefer-recent:,.1". Both can be specified like this:
"prefer-recent:1,12".
This only effects full text search, not prefix search.
The performance cost of this is marginal. Tested on beta's copy of
simplewiki there was more variable between repeated invocations of the
same query than between invocations of the query with and without the extra
math required for prefer-recent. It was a few milliseconds either way.
Change-Id: I53967ce5d210c63963a3d377450c394c6373a5bd
---
M CirrusSearch.php
M includes/CirrusSearchSearcher.php
M tests/browser/features/full_text.feature
M tests/browser/features/step_definitions/general_steps.rb
M tests/browser/features/support/hooks.rb
5 files changed, 121 insertions(+), 3 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/CirrusSearch
refs/changes/97/95697/1
diff --git a/CirrusSearch.php b/CirrusSearch.php
index ef60ca1..170387e 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -106,6 +106,21 @@
// Weight of fields relative to article text
$wgCirrusSearchWeights = array( 'title' => 20.0, 'redirect' => 15.0, 'heading'
=> 5.0 );
+// Portion of an article's score that decays with time since it's last update.
Defaults to 0
+// meaning don't decay the score at all unless prefer-recent: prefixes the
query.
+$wgCirrusSearchPreferRecentDefaultDecayPortion = 0;
+
+// Portion of an article's score that decays with time if prefer-recent:
prefixes the query but
+// doesn't specify a portion. Defaults to .6 because that approximates the
behavior that
+// wikinews has been using for years. An article 160 days old is worth about
70% of its new score.
+$wgCirrusSearchPreferRecentUnspecifiedDecayPortion = .6;
+
+// Default number of days it takes the portion of an article's score that
decays with time since
+// last update to half way decay to use if prefer-recent: prefixes query and
doesn't specify a
+// half life or $wgCirrusSearchPreferRecentDefaultDecayPortion is non 0.
Default to 157 because
+// that approximates the behavior that wikinews has been using for years.
+$wgCirrusSearchPreferRecentDefaultHalfLife = 160;
+
// How long to cache link counts for (in seconds)
$wgCirrusSearchLinkCountCacheTime = 0;
diff --git a/includes/CirrusSearchSearcher.php
b/includes/CirrusSearchSearcher.php
index 5df631b..8381a87 100644
--- a/includes/CirrusSearchSearcher.php
+++ b/includes/CirrusSearchSearcher.php
@@ -67,6 +67,16 @@
* @var string description of the current operation used in logging
errors
*/
private $description;
+ /**
+ * @var float portion of article's score which decays with time.
Defaults to 0 meaning don't decay the score
+ * with time since the last update.
+ */
+ private $preferRecentDecayPortion = 0;
+ /**
+ * @var float number of days it takes an the portion of an article
score that will decay with time
+ * since last update to decay half way. Defaults to 0 meaning don't
decay the score with time.
+ */
+ private $preferRecentHalfLife = 0;
public function __construct( $offset, $limit, $namespaces ) {
$this->offset = $offset;
@@ -123,6 +133,8 @@
global $wgCirrusSearchPhraseRescoreBoost;
global $wgCirrusSearchPhraseRescoreWindowSize;
global $wgCirrusSearchPhraseUseText;
+ global $wgCirrusSearchPreferRecentDefaultDecayPortion;
+ global $wgCirrusSearchPreferRecentDefaultHalfLife;
wfDebugLog( 'CirrusSearch', "Searching: \"$term\"" );
// Transform Mediawiki specific syntax to filters and extra
(pre-escaped) query string
@@ -146,6 +158,33 @@
}
}
wfProfileOut( __METHOD__ . '-prefix-filter' );
+
+ wfProfileIn( __METHOD__ . '-prefer-recent' );
+ $preferRecentDecayPortion =
$wgCirrusSearchPreferRecentDefaultDecayPortion;
+ $preferRecentHalfLife =
$wgCirrusSearchPreferRecentDefaultHalfLife;
+ // Matches "prefer-recent:" and then an optional floating point
number <= 1 but >= 0 (decay
+ // portion) and then an optional comma followed by another
floating point number >= 0 (half life)
+ $term = preg_replace_callback(
+
'/prefer-recent:(1|(?:0?(?:\.[0-9]+)?))?(?:,([0-9]*\.?[0-9]+))? ?/',
+ function ( $matches ) use ( &$preferRecentDecayPortion,
&$preferRecentHalfLife ) {
+ global
$wgCirrusSearchPreferRecentUnspecifiedDecayPortion;
+ if ( isset( $matches[ 1 ] ) && strlen(
$matches[ 1 ] ) ) {
+ $preferRecentDecayPortion = floatval(
$matches[ 1 ] );
+ } else {
+ $preferRecentDecayPortion =
$wgCirrusSearchPreferRecentUnspecifiedDecayPortion;
+ }
+ if ( isset( $matches[ 2 ] ) ) {
+ $preferRecentHalfLife = floatval(
$matches[ 2 ] );
+ }
+ wfDebugLog( 'CirrusSearch', "prefer recent
$preferRecentDecayPortion $preferRecentHalfLife" );
+ return '';
+ },
+ $term
+ );
+ $this->preferRecentDecayPortion = $preferRecentDecayPortion;
+ $this->preferRecentHalfLife = $preferRecentHalfLife;
+ wfProfileOut( __METHOD__ . '-prefer-recent' );
+
//Handle other filters
wfProfileIn( __METHOD__ . '-other-filters' );
$filters = $this->filters;
@@ -663,11 +702,28 @@
}
/**
- * Wrap query in link based boosts.
+ * Wrap query in link (and potentially last update time) based boosts.
* @param $query null|Elastica\Query optional query to boost. if null
the match_all is assumed
* @return query that will run $query and boost results based on links
*/
- private static function boostQuery( $query = null ) {
- return new \Elastica\Query\CustomScore( "_score *
log10(doc['links'].value + doc['redirect_links'].value + 2)", $query );
+ private function boostQuery( $query = null ) {
+ // MVEL code for incoming links boost
+ $scoreBoostMvel = " * log10(doc['links'].value +
doc['redirect_links'].value + 2)";
+ // MVEL code for last update time decay
+ $lastUpdateDecayMvel = '';
+ if ( $this->preferRecentDecayPortion > 0 &&
$this->preferRecentHalfLife > 0 ) {
+ // Convert half life for time in days to decay constant
for time in milliseconds.
+ $decayConstant = log( 2 ) / $this->preferRecentHalfLife
/ 86400000;
+ // e^ct - 1 where t is last modified time - now which
is negative
+ $exponentialDecayMvel = "Math.expm1($decayConstant *
(doc['timestamp'].value - time()))";
+ // p(e^ct - 1)
+ if ( $this->preferRecentDecayPortion !== 1.0 ) {
+ $exponentialDecayMvel = "$exponentialDecayMvel
* $this->preferRecentDecayPortion";
+ }
+ // p(e^ct - 1) + 1 which is easier to calculate than
bet reduces to 1 - p + pe^ct
+ // Which breaks the score into an unscaled portion (1 -
p) and a scaled portion (p)
+ $lastUpdateDecayMvel = " * ($exponentialDecayMvel + 1)";
+ }
+ return new \Elastica\Query\CustomScore( '_score' .
$scoreBoostMvel . $lastUpdateDecayMvel, $query );
}
}
diff --git a/tests/browser/features/full_text.feature
b/tests/browser/features/full_text.feature
index 5a9d4d4..d8f8ae2 100644
--- a/tests/browser/features/full_text.feature
+++ b/tests/browser/features/full_text.feature
@@ -463,3 +463,31 @@
Scenario: wildcards don't match stemmed matches
When I search for pi*le
Then there are no search results
+
+ @prefer_recent
+ Scenario Outline: Recently updated articles are prefered if prefer-recent:
is specified
+ When I search for PreferRecent First OR Second OR Third
+ Then PreferRecent Second Second is the first search result
+ When I search for prefer-recent:<options> PreferRecent First OR Second OR
Third
+ Then PreferRecent Third is the first search result
+ Examples:
+ | options |
+ | 1,.001 |
+ | 1,0.001 |
+ | 1,.0001 |
+ | .99,.0001 |
+ | .99,.001 |
+ | .8,.0001 |
+ | .7,.0001 |
+
+ @prefer_recent
+ Scenario Outline: You can specify prefer-recent: in such a way that being
super recent isn't enough
+ When I search for prefer-recent:<options> PreferRecent First OR Second OR
Third
+ Then PreferRecent Second Second is the first search result
+ Examples:
+ | options |
+ | |
+ | 1 |
+ | 1,1 |
+ | 1,.1 |
+ | .4,.0001 |
diff --git a/tests/browser/features/step_definitions/general_steps.rb
b/tests/browser/features/step_definitions/general_steps.rb
index 2b4387d..f177146 100644
--- a/tests/browser/features/step_definitions/general_steps.rb
+++ b/tests/browser/features/step_definitions/general_steps.rb
@@ -1,6 +1,11 @@
Given(/^I am logged in$/) do
visit(LoginPage).login_with(ENV['MEDIAWIKI_USER'], ENV['MEDIAWIKI_PASSWORD'])
end
+
Given(/^I am at a random page.*$/) do
visit RandomPage
end
+
+Given(/wait ([0-9]+) seconds/) do |seconds|
+ sleep(Integer(seconds))
+end
diff --git a/tests/browser/features/support/hooks.rb
b/tests/browser/features/support/hooks.rb
index 3bb4872..b086561 100644
--- a/tests/browser/features/support/hooks.rb
+++ b/tests/browser/features/support/hooks.rb
@@ -173,3 +173,17 @@
end
$prefix_filter = true
end
+
+Before('@prefer_recent') do
+ if !$prefix_filter
+ # These are updated per process instead of per test because of the 20
second wait
+ # Note that the scores have to be close together because 20 seconds
doesn't mean a whole lot
+ steps %Q{
+ Given a page named PreferRecent First exists with contents %{epoch}
+ And a page named PreferRecent Second Second exists with contents %{epoch}
+ And wait 20 seconds
+ And a page named PreferRecent Third exists with contents %{epoch}
+ }
+ end
+ $prefix_filter = true
+end
--
To view, visit https://gerrit.wikimedia.org/r/95697
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I53967ce5d210c63963a3d377450c394c6373a5bd
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits