jenkins-bot has submitted this change and it was merged.

Change subject: Prioritize single page edits over template changes
......................................................................


Prioritize single page edits over template changes

Splits single page edits onto another job type from template changes.
Because the job queue gives each job type equal attention and each job
type gets its own fifo queue the will prevent updates from templates from
getting "in front" of single page updates.  It won't actually cause single
page updates to be done before any template updates, in fact both types
will get equal weighting.  In effect, though, this will prioritize single
page updates because there is much less likely to be a huge backlog of
them.  Template updates, on the other hand, can easilly cause a large
backlog of search updates to be queued.

The goal here is to make sure single page updates get into the seaerch
index fast, like we advertise Cirrus is capable of, even if there are
tons of template changes going on.  We'll still work on template changes,
but they can wait.

This also turns on deduplication for cirrus jobs which _should_ be safe
because the deduplication only happens to unclaimed jobs.  It wouldn't be
safe to deduplicate claimed Cirrus jobs.  It should help if there is a
huge backlog of Cirrus jobs.  OTOH, we don't do anything special to make
sure that jobs are deduplicate-able.  Frankly it'd be pretty easy to make
jobs mergeable and save tons of execution time but that isn't an option
at this point.

Change-Id: I2c8ddd68dc22aca3d92ef4f7f72a68e6dbe09986
---
M CirrusSearch.php
M includes/CirrusSearchHooks.php
M includes/CirrusSearchJob.php
M includes/CirrusSearchLinksUpdateJob.php
M includes/CirrusSearchUpdater.php
5 files changed, 46 insertions(+), 6 deletions(-)

Approvals:
  Aaron Schulz: Looks good to me, but someone else must approve
  Chad: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/CirrusSearch.php b/CirrusSearch.php
index 24bce20..317d41e 100644
--- a/CirrusSearch.php
+++ b/CirrusSearch.php
@@ -242,5 +242,6 @@
  */
 $wgJobClasses[ 'cirrusSearchDeletePages' ] = 'CirrusSearchDeletePagesJob';
 $wgJobClasses[ 'cirrusSearchLinksUpdate' ] = 'CirrusSearchLinksUpdateJob';
+$wgJobClasses[ 'cirrusSearchLinksUpdatePrioritized' ] = 
'CirrusSearchLinksUpdateJob';
 $wgJobClasses[ 'cirrusSearchOtherIndex' ] = 'CirrusSearchOtherIndexJob';
 $wgJobClasses[ 'cirrusSearchUpdatePages' ] = 'CirrusSearchUpdatePagesJob';
diff --git a/includes/CirrusSearchHooks.php b/includes/CirrusSearchHooks.php
index b377404..03a288c 100644
--- a/includes/CirrusSearchHooks.php
+++ b/includes/CirrusSearchHooks.php
@@ -151,11 +151,17 @@
         * @return bool
         */
        public static function linksUpdateCompletedHook( $linksUpdate ) {
-               $job = new CirrusSearchLinksUpdateJob( 
$linksUpdate->getTitle(), array(
+               $params = array(
                        'addedLinks' => $linksUpdate->getAddedLinks(),
                        'removedLinks' => $linksUpdate->getRemovedLinks(),
                        'primary' => true,
-               ) );
+               );
+               // Prioritize jobs that are triggered from a web process.  This 
should prioritize
+               // single page update jobs over those triggered by template 
changes.
+               if ( PHP_SAPI != 'cli' ) {
+                       $params[ 'prioritize' ] = true;
+               }
+               $job = new CirrusSearchLinksUpdateJob( 
$linksUpdate->getTitle(), $params );
                JobQueueGroup::singleton()->push( $job );
                return true;
        }
diff --git a/includes/CirrusSearchJob.php b/includes/CirrusSearchJob.php
index 1d15d77..75087a4 100644
--- a/includes/CirrusSearchJob.php
+++ b/includes/CirrusSearchJob.php
@@ -22,6 +22,12 @@
                // eg: CirrusSearchDeletePagesJob -> cirrusSearchDeletePages
                $jobName = lcfirst( str_replace( 'Job', '', get_class( $this ) 
) );
                parent::__construct( $jobName, $title, $params, $id );
+
+               // All CirrusSearch jobs are reasonably expensive.  Most 
involve parsing and it
+               // is ok to remove duplicate _unclaimed_ cirrus jobs.  Once a 
cirrus job is claimed
+               // it can't be deduplicated or else the search index will end 
up with out of date
+               // data.  Luckily, this is how the JobQueue implementations 
work.
+               $this->removeDuplicates = true;
        }
 
        /**
diff --git a/includes/CirrusSearchLinksUpdateJob.php 
b/includes/CirrusSearchLinksUpdateJob.php
index 81bcf57..5c6d307 100644
--- a/includes/CirrusSearchLinksUpdateJob.php
+++ b/includes/CirrusSearchLinksUpdateJob.php
@@ -2,7 +2,12 @@
 /**
  * Performs the appropriate updates to Elasticsearch after a LinksUpdate is
  * completed.  The page itself is updated first then a second copy of this job
- * is queued to update linked articles.
+ * is queued to update linked articles if any links change.  The job can be
+ * 'prioritized' via the 'prioritize' parameter which will switch it to a
+ * different queue then the non-prioritized jobs.  Prioritized jobs will never
+ * be deduplicated with non-prioritized jobs which is good because we can't
+ * control which job is removed during deduplication.  In our case it'd only be
+ * ok to remove the non-prioritized version.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -20,16 +25,31 @@
  * http://www.gnu.org/copyleft/gpl.html
  */
 class CirrusSearchLinksUpdateJob extends CirrusSearchJob {
+       public function __construct( $title, $params, $id = 0 ) {
+               parent::__construct( $title, $params, $id );
+
+               if ( $this->isPrioritized() ) {
+                       $this->command .= 'Prioritized';
+               }
+               // Note that we have to keep the prioritized param or else when 
the job
+               // is loaded it'll load under a different name/command/type 
which would
+               // be confusing.
+       }
+
        protected function doJob() {
                if ( $this->params[ 'primary' ] ) {
                        CirrusSearchUpdater::updateFromTitle( $this->title );
                        if ( count( $this->params[ 'addedLinks' ] ) > 0 ||
                                        count( $this->params[ 'removedLinks' ] 
) > 0 ) {
-                               $next = new CirrusSearchLinksUpdateJob( 
$this->title, array(
+                               $params = array(
                                        'addedLinks' => $this->params[ 
'addedLinks' ],
                                        'removedLinks' => $this->params[ 
'removedLinks' ],
                                        'primary' => false,
-                               ) );
+                               );
+                               if ( $this->isPrioritized() ) {
+                                       $params[ 'prioritize' ] = true;
+                               }
+                               $next = new CirrusSearchLinksUpdateJob( 
$this->title, $params );
                                JobQueueGroup::singleton()->push( $next );
                        }
                } else {
@@ -37,4 +57,11 @@
                                $this->params[ 'removedLinks' ] );
                }
        }
+
+       /**
+        * @return is this job prioritized?
+        */
+       private function isPrioritized() {
+               return isset( $this->params[ 'prioritize' ] ) && $this->params[ 
'prioritize' ];
+       }
 }
diff --git a/includes/CirrusSearchUpdater.php b/includes/CirrusSearchUpdater.php
index 3e0bb65..f2f4621 100644
--- a/includes/CirrusSearchUpdater.php
+++ b/includes/CirrusSearchUpdater.php
@@ -2,7 +2,7 @@
 /**
  * Performs updates and deletes on the Elasticsearch index.  Called by
  * CirrusSearch.body.php (our SearchEngine implementation), forceSearchIndex
- * (for bulk updates), and hooked into LinksUpdate (for implied updates).
+ * (for bulk updates), and CirrusSearch's jobs.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by

-- 
To view, visit https://gerrit.wikimedia.org/r/104124
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2c8ddd68dc22aca3d92ef4f7f72a68e6dbe09986
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: Manybubbles <[email protected]>
Gerrit-Reviewer: Aaron Schulz <[email protected]>
Gerrit-Reviewer: Chad <[email protected]>
Gerrit-Reviewer: Manybubbles <[email protected]>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to