Raphaël Badin has proposed merging 
lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #867941 in Launchpad itself: "+activereviews times out"
  https://bugs.launchpad.net/launchpad/+bug/867941

For more details, see:
https://code.launchpad.net/~rvb/launchpad/activereviews-bug-867941-hugequery/+merge/82650

= Summary =
This branch aims to improve the performance of 
code.lp.net/~user/+activereviews.  Apart from the many small queries issued by 
this page (this should be fixed by lp:~rvb/launchpad/activereviews-bug-867941) 
the main problem with this page is the huge query that is issued when the 
number of public branches for a user is large.  For instance, the page 
https://code.launchpad.net/~ubuntu-branches/+activereviews issues in production 
a 4.5s query (https://pastebin.canonical.com/55779/).  The approach here is to 
refactor the query to use a CTE to materialize once and for all the list of the 
public and private-but-visible-for-the-user branch ids instead of repeating 
this expensive subquery (in the paste above, the expensive subquery is present 
lines 10-23, 24-37, 49-62 and 66-79).  It would be an even better option to not 
even materialize this list but the code inside 
./lib/lp/code/model/branchcollection.py makes this difficult to achieve that 
without a huge refactoring.  I decided to do something simpler that should 
still improve things significantly.

= Details =
The main change here is the refactoring of _getBranchVisibilityExpression to 
use a CTE instead of a subquery. The new version of 
_getBranchVisibilityExpression populates self._with_dict which will be use to 
build the CTE. We use a dict that is then transformed into a proper "with" 
storm expression so that many calls to _addVisibleBranchesCTE will not create 
many declarations of the same CTE.
Another trick is that getMergeProposalsForPerson uses getMergeProposals and 
getMergeProposalsForReviewer and then uses an union of the results.  To avoid 
having the CTE declared in both queries, I had to restructure the code to a) be 
able to get BMP ids — hence the introduction of the parameter return_ids) b) be 
able to specify at which level the CTE must be defined — hence the introduction 
of the parameter include_with (getMergeProposals or 
getMergeProposalsForReviewer can also be used directly by external code).

= Tests =
I admit I've not created any test for this even if I've tested it manually to 
make sure that the query issued is of the right form.  I don't think a test to 
assert that the query is of the right form would be appropriate but maybe a 
reviewer will have an better idea.

= Q/A =
This branch should not change the behavior of the +activereview page but it's 
important to make sure that this really improves the performance before we 
release it.

Here are a few numbers on how the huge original query performs:
qastaging: [1860.0, 6737.0, 1846.0, 1953.0, 1860.0, 1677.0, 1734.0, 2161.0, 
2087.0, 1948.0, 2007.0, 1750.0]
prod: [4517.0, 4786.0, 4786.0, 4517.0, 4502.0, 4502.0, 4502.0, 4350.0]
-- 
https://code.launchpad.net/~rvb/launchpad/activereviews-bug-867941-hugequery/+merge/82650
Your team Launchpad code reviewers is requested to review the proposed merge of 
lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad.
=== modified file 'lib/lp/code/model/branchcollection.py'
--- lib/lp/code/model/branchcollection.py	2011-11-01 08:17:57 +0000
+++ lib/lp/code/model/branchcollection.py	2011-11-18 09:23:03 +0000
@@ -127,6 +127,16 @@
         if exclude_from_search is None:
             exclude_from_search = []
         self._exclude_from_search = exclude_from_search
+        self._with_dict = {}
+
+    @property
+    def store_with_with(self):
+        store = self.store
+        if len(self._with_dict) != 0:
+            with_expr = [With(name, expr)
+                for (name, expr) in self._with_dict.items()]
+            store = store.with_(with_expr)
+        return store
 
     def count(self):
         """See `IBranchCollection`."""
@@ -264,7 +274,8 @@
             self._tables.values() + self._asymmetric_tables.values())
         tables = [Branch] + list(all_tables)
         expressions = self._getBranchExpressions()
-        resultset = self.store.using(*tables).find(Branch, *expressions)
+        resultset = self.store_with_with.using(
+            *tables).find(Branch, *expressions)
         if not eager_load:
             return resultset
 
@@ -285,6 +296,15 @@
                           target_branch=None, merged_revnos=None,
                           eager_load=False):
         """See `IBranchCollection`."""
+        return self._getMergeProposals(
+            statuses=statuses, for_branches=for_branches,
+            target_branch=target_branch, merged_revnos=merged_revnos,
+            eager_load=eager_load)
+
+    def _getMergeProposals(self, statuses=None, for_branches=None,
+                          target_branch=None, merged_revnos=None,
+                          eager_load=False, return_ids=False,
+                          include_with=True):
         if for_branches is not None and not for_branches:
             # We have an empty branches list, so we can shortcut.
             return EmptyResultSet()
@@ -296,15 +316,19 @@
             target_branch is not None or
             merged_revnos is not None):
             return self._naiveGetMergeProposals(statuses, for_branches,
-                target_branch, merged_revnos, eager_load)
+                target_branch, merged_revnos, eager_load,
+                return_ids=return_ids, include_with=include_with)
         else:
             # When examining merge proposals in a scope, this is a moderately
             # effective set of constrained queries. It is not effective when
             # unscoped or when tight constraints on branches are present.
-            return self._scopedGetMergeProposals(statuses)
+            return self._scopedGetMergeProposals(
+                statuses, return_ids=return_ids)
 
     def _naiveGetMergeProposals(self, statuses=None, for_branches=None,
-        target_branch=None, merged_revnos=None, eager_load=False):
+                                target_branch=None, merged_revnos=None,
+                                eager_load=False, return_ids=False,
+                                include_with=True):
 
         def do_eager_load(rows):
             branch_ids = set()
@@ -361,14 +385,20 @@
         if statuses is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(statuses))
-        resultset = self.store.using(*tables).find(
-            BranchMergeProposal, *expressions)
+        if include_with:
+            store = self.store_with_with
+        else:
+            store = self.store
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
+        resultset = store.using(*tables).find(
+            returned_obj, *expressions)
         if not eager_load:
             return resultset
         else:
             return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)
 
-    def _scopedGetMergeProposals(self, statuses):
+    def _scopedGetMergeProposals(self, statuses, return_ids=False):
         scope_tables = [Branch] + self._tables.values()
         scope_expressions = self._branch_filter_expressions
         select = self.store.using(*scope_tables).find(
@@ -391,19 +421,30 @@
         if statuses is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(statuses))
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
         return self.store.with_(with_expr).using(*tables).find(
-            BranchMergeProposal, *expressions)
+            returned_obj, *expressions)
 
     def getMergeProposalsForPerson(self, person, status=None):
         """See `IBranchCollection`."""
         # We want to limit the proposals to those where the source branch is
         # limited by the defined collection.
-        owned = self.ownedBy(person).getMergeProposals(status)
-        reviewing = self.getMergeProposalsForReviewer(person, status)
-        return owned.union(reviewing)
+        owned = self.ownedBy(person)._getMergeProposals(
+            status, include_with=False, return_ids=True)
+        reviewing = self._getMergeProposalsForReviewer(
+            person, status, include_with=False, return_ids=True)
+        result = owned.union(reviewing)
+        return self.store_with_with.using([BranchMergeProposal]).find(
+            BranchMergeProposal,
+            In(BranchMergeProposal.id, result._get_select()))
 
     def getMergeProposalsForReviewer(self, reviewer, status=None):
         """See `IBranchCollection`."""
+        return self._getMergeProposalsForReviewer(reviewer, status=status)
+
+    def _getMergeProposalsForReviewer(self, reviewer, status=None,
+                                     include_with=True, return_ids=False):
         tables = [
             BranchMergeProposal,
             Join(CodeReviewVoteReference,
@@ -423,8 +464,14 @@
         if status is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(status))
-        proposals = self.store.using(*tables).find(
-            BranchMergeProposal, *expressions)
+        if include_with:
+            store = self.store_with_with
+        else:
+            store = self.store
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
+        proposals = store.using(*tables).find(
+            returned_obj, *expressions)
         # Apply sorting here as we can't do it in the browser code.  We need
         # to think carefully about the best places to do this, but not here
         # nor now.
@@ -826,21 +873,29 @@
                        Branch.transitively_private == True)))
         return private_branches
 
+    def _addVisibleBranchesCTE(self):
+        """Add the 'visible_branches' CTE to self._with_dict.
+
+        """
+        public_branches = Branch.transitively_private == False
+        if self._private_branch_ids is not None:
+            branches = Or(
+                public_branches,
+                Branch.id.is_in(self._private_branch_ids))
+        else:
+            branches = public_branches
+        branches_select = self.store.using(
+            [Branch]).find(Branch.id, branches)._get_select()
+        self._with_dict.update({'visible_branches': branches_select})
+
     def _getBranchVisibilityExpression(self, branch_class=Branch):
         """Return the where clauses for visibility.
 
         :param branch_class: The Branch class to use - permits using
             ClassAliases.
         """
-        public_branches = branch_class.transitively_private == False
-        if self._private_branch_ids is None:
-            # Public only.
-            return [public_branches]
-        else:
-            public_or_private = Or(
-                public_branches,
-                branch_class.id.is_in(self._private_branch_ids))
-            return [public_or_private]
+        self._addVisibleBranchesCTE()
+        return [branch_class.id.is_in(SQL("SELECT id FROM visible_branches"))]
 
     def _getCandidateBranchesWith(self):
         """Return WITH clauses defining candidate branches.

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to