Aaron Schulz has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/299260

Change subject: Improvements to RefreshLinksJob locking
......................................................................

Improvements to RefreshLinksJob locking

* Removed the lockAndGetLatest() call which caused contention problems.
  Previously, job #2 could block on job #1 in that method, then job #1
  yields the row lock to job #2 in LinksUpdate::acquirePageLock() by
  committing, then job #1 blocks on job #2 in updateLinksTimestamp().
  This caused timeout errors. It also is not fully safe ever since
  batching and acquirePageLock() was added.
* Add an outer getScopedLockAndFlush() call to runForTitle() which
  avoids this contention (as well as contention with page edits)
  but still prevents older jobs from clobbering newer jobs. Edits
  can happen concurrently, since they will enqueue a job post-commit
  that will block on the lock.

Change-Id: I9e2d1eefd7cbb3d2f333c595361d070527d6f0c5
---
M includes/jobqueue/jobs/RefreshLinksJob.php
1 file changed, 22 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/60/299260/1

diff --git a/includes/jobqueue/jobs/RefreshLinksJob.php 
b/includes/jobqueue/jobs/RefreshLinksJob.php
index c76ea4f..78059aa 100644
--- a/includes/jobqueue/jobs/RefreshLinksJob.php
+++ b/includes/jobqueue/jobs/RefreshLinksJob.php
@@ -128,8 +128,25 @@
         * @return bool
         */
        protected function runForTitle( Title $title ) {
+               $stats = 
MediaWikiServices::getInstance()->getStatsdDataFactory();
+
                $page = WikiPage::factory( $title );
                $page->loadPageData( WikiPage::READ_LATEST );
+
+               // Serialize links updates by page ID so they see each others' 
changes
+               $scopedLock = wfGetDB( DB_MASTER )->getScopedLockAndFlush(
+                       "RefreshLinks:pageid:{$page->getId()}",
+                       __METHOD__,
+                       15
+               );
+               if ( !$scopedLock ) {
+                       throw new RuntimeException( "Could not acquire lock on 
page #{$page->getId()}." );
+               }
+               // Get the latest ID *after* getScopedLockAndFlush() flushed 
the transaction.
+               // This is used to detect edits/moves after loadPageData() but 
before the scope lock.
+               // The works around the chicken/egg problem of determining the 
scope lock key.
+               $latest = $title->getLatestRevID( Title::GAID_FOR_UPDATE );
+
                if ( !empty( $this->params['triggeringRevisionId'] ) ) {
                        // Fetch the specified revision; lockAndGetLatest() 
below detects if the page
                        // was edited since and aborts in order to avoid 
corrupting the link tables
@@ -142,15 +159,15 @@
                        $revision = Revision::newFromTitle( $title, false, 
Revision::READ_LATEST );
                }
 
-               $stats = 
MediaWikiServices::getInstance()->getStatsdDataFactory();
-
                if ( !$revision ) {
                        $stats->increment( 'refreshlinks.rev_not_found' );
                        $this->setLastError( "Revision not found for 
{$title->getPrefixedDBkey()}" );
                        return false; // just deleted?
-               } elseif ( !$revision->isCurrent() || $revision->getPage() != 
$page->getId() ) {
-                       // If the revision isn't current, there's no point in 
doing a bunch
-                       // of work just to fail at the lockAndGetLatest() check 
later.
+               } elseif ( $revision->getId() != $latest || 
$revision->getPage() !== $page->getId() ) {
+                       // Do not clobber over newer updates with older ones. 
If all jobs where FIFO and
+                       // serialized, it would be OK to update links based on 
older revisions since it
+                       // would eventually get to the latest. Since that is 
not the case (by design),
+                       // only update the link tables to a state matching the 
current revision's output.
                        $stats->increment( 'refreshlinks.rev_not_current' );
                        $this->setLastError( "Revision {$revision->getId()} is 
not current" );
                        return false;
@@ -248,17 +265,6 @@
                                        $update->setTriggeringUser( $user );
                                }
                        }
-               }
-
-               $latestNow = $page->lockAndGetLatest();
-               if ( !$latestNow || $revision->getId() != $latestNow ) {
-                       // Do not clobber over newer updates with older ones. 
If all jobs where FIFO and
-                       // serialized, it would be OK to update links based on 
older revisions since it
-                       // would eventually get to the latest. Since that is 
not the case (by design),
-                       // only update the link tables to a state matching the 
current revision's output.
-                       $stats->increment( 'refreshlinks.rev_cas_failure' );
-                       $this->setLastError( "page_latest changed from 
{$revision->getId()} to $latestNow" );
-                       return false;
                }
 
                DataUpdate::runUpdates( $updates );

-- 
To view, visit https://gerrit.wikimedia.org/r/299260
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e2d1eefd7cbb3d2f333c595361d070527d6f0c5
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Aaron Schulz <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to