Eileen has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/285585

Change subject: CRM-18842 Dedupe query: remove OR join in favour of more 
performant UNION
......................................................................

CRM-18842 Dedupe query: remove OR join in favour of more performant UNION

It probably makes more sense to QA this over on github ... 
https://github.com/civicrm/civicrm-core/pull/8251/commits

Unions are much faster than OR joins. This change took the length of the query 
to get the dedupes on a large database from
'as long as it took for the server to fall over' to less than one second on a 
small group of contacts

This query is only affecting one path - ie Individuals - at the moment as I can 
only extend that as fast as I can write tests.

Bug: T132296
Change-Id: Ib52441fbbe56b4189765db9ff17847b98917221c
---
M CRM/Dedupe/BAO/QueryBuilder.php
M CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php
2 files changed, 27 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/wikimedia/fundraising/crm/civicrm 
refs/changes/85/285585/1

diff --git a/CRM/Dedupe/BAO/QueryBuilder.php b/CRM/Dedupe/BAO/QueryBuilder.php
index 484ce39..3daff16 100644
--- a/CRM/Dedupe/BAO/QueryBuilder.php
+++ b/CRM/Dedupe/BAO/QueryBuilder.php
@@ -22,4 +22,28 @@
     }
   }
 
+  /**
+   * If a contact list is specified then adjust the query to ensure one 
contact is in that list.
+   *
+   * Doing an OR join here will lead to a server-killing unindexed query. 
However, a union will
+   * perform better.
+   *
+   * @param array $contactList
+   * @param string $query
+   * @param string $strID1
+   * @param string $strID2
+   *
+   * @return string
+   */
+  protected static function filterQueryByContactList(array $contactList, 
$query, $strID1 = 'contact1.id', $strID2 = 'contact2.id') {
+    if (empty($contactList)) {
+      return $query . " AND ($strID1 < $strID2)";
+    }
+    $contactIDs = implode(',', $contactList);
+    return "$query AND $strID1 IN ($contactIDs) AND $strID1 > $strID2
+      UNION $query AND $strID1 > $strID2 AND $strID2 IN ($contactIDs) AND 
$strID1 NOT IN ($contactIDs)
+    ";
+
+  }
+
 }
diff --git a/CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php 
b/CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php
index 806f023..e150f44 100644
--- a/CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php
+++ b/CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php
@@ -53,7 +53,7 @@
    * @return array
    */
   public static function internal($rg) {
-    $query = "
+    $query = self::filterQueryByContactList($rg->contactIds, "
             SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as 
weight
             FROM civicrm_contact as contact1
               JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
@@ -63,8 +63,8 @@
               JOIN civicrm_email as email2 ON
                 email2.contact_id=contact2.id AND
                 email1.email=email2.email
-            WHERE contact1.contact_type = 'Individual'
-              AND " . self::internalFilters($rg);
+            WHERE contact1.contact_type = 'Individual'");
+
     return array(
       "civicrm_contact.{$rg->name}.{$rg->threshold}" => $query,
     );

-- 
To view, visit https://gerrit.wikimedia.org/r/285585
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib52441fbbe56b4189765db9ff17847b98917221c
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/fundraising/crm/civicrm
Gerrit-Branch: master
Gerrit-Owner: Eileen <emcnaugh...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to