James Westby has proposed merging lp:~james-w/python-oops-tools/recent-oopses 
into lp:python-oops-tools.

Commit message:
Improve the query performance of the recent oopses view.

Firstly the planner was picking a difficult plan because of the join, but
we expect few prefixes per-report, so getting the prefix ids first, then
using a simple IN rather than a join leads to an easier plan.

Once the easier plan is in use the indexes still aren't that helpful, as
it chooses the index on 'date', but it still has to filter based on the
prefix, so we add an index on (prefix_id, date) so that it can select the
right rows as quickly as possible.

Lastly lazy evaluation was causing 50 queries for the 'oopsinfestation'
info, so we use select_related() to include that in the main query.

There's a test that we do 1 query for the last point, but no test for
the speed of the query.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~james-w/python-oops-tools/recent-oopses/+merge/138356

Hi,

The recent oops views are performing terribly, because the production
data makes the query really slow.

Firstly, lazy evaluation means that we are doing 50 queries for the
oopsinfestation info at render time. I've added a select_related
to avoid that.

Secondly, the query is just currently slow.

We start of with a plan like: (using some data generated locally rather
than the real data, but the plan matches)

 Limit  (cost=5.37..537.81 rows=50 width=2017) (actual time=0.212..145.954 
rows=50 loops=1)
   ->  Nested Loop  (cost=5.37..214534.09 rows=20146 width=2017) (actual 
time=0.210..145.942 rows=50 loops=1)
         ->  Nested Loop  (cost=5.37..133934.13 rows=20185 width=1978) (actual 
time=0.175..145.566 rows=50 loops=1)
               Join Filter: (oops_prefix.id = oops_oops.prefix_id)
               ->  Index Scan Backward using oops_oops_date on oops_oops  
(cost=0.00..115751.56 rows=1211106 width=1978) (actual time=0.049..59.086 
rows=216002 loops=1)
               ->  Materialize  (cost=5.37..15.98 rows=1 width=8) (actual 
time=0.000..0.000 rows=1 loops=216002)
                     ->  Hash Join  (cost=5.37..15.98 rows=1 width=8) (actual 
time=0.109..0.111 rows=1 loops=1)
                           Hash Cond: (oops_report_prefixes.prefix_id = 
oops_prefix.id)
                           ->  Bitmap Heap Scan on oops_report_prefixes  
(cost=4.33..14.88 rows=10 width=4) (actual time=0.045..0.045 rows=1 loops=1)
                                 Recheck Cond: (report_id = 7)
                                 ->  Bitmap Index Scan on 
oops_report_prefixes_report_id  (cost=0.00..4.33 rows=10 width=0) (actual 
time=0.018..0.018 rows=2 loops=1)
                                       Index Cond: (report_id = 7)
                           ->  Hash  (cost=1.02..1.02 rows=2 width=4) (actual 
time=0.033..0.033 rows=10 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on oops_prefix  (cost=0.00..1.02 
rows=2 width=4) (actual time=0.017..0.022 rows=10 loops=1)
         ->  Index Scan using oops_infestation_pkey on oops_infestation  
(cost=0.00..3.98 rows=1 width=39) (actual time=0.006..0.007 rows=1 loops=50)
               Index Cond: (id = oops_oops.oopsinfestation_id)
 Total runtime: 146.193 ms

This is doing a join that is making the plan fairly costly, but we are 
typically only going to
be joining for one or a few rows of the other table. If we instead issue a 
query for the
few rows, the use a simple IN for the foreign key relation we can get a much 
simpler
query (again with local data):

 Limit  (cost=0.00..991.15 rows=50 width=2017) (actual time=0.061..65.703 
rows=50 loops=1)
   ->  Nested Loop  (cost=0.00..162944.88 rows=8220 width=2017) (actual 
time=0.060..65.697 rows=50 loops=1)
         ->  Index Scan Backward using oops_oops_date on oops_oops  
(cost=0.00..118756.07 rows=8236 width=1978) (actual time=0.034..65.311 rows=50 
loops=1)
               Filter: (prefix_id = 9)
         ->  Index Scan using oops_infestation_pkey on oops_infestation  
(cost=0.00..5.35 rows=1 width=39) (actual time=0.006..0.006 rows=1 loops=50)
               Index Cond: (id = oops_oops.oopsinfestation_id)
 Total runtime: 65.784 ms

which has a worse estimate, but better actual. I've tested this version of the 
query on production,
and it has the same characteristic (and drops the actual from 0.158..31828.728 
to 80.303..459.145).

Next we see that the index scan on oops_oops_date is the majority of the time, 
and it has
a filter, so we try adding an index that also includes the filter criteria:

 Limit  (cost=0.00..342.44 rows=50 width=2017) (actual time=0.085..0.531 
rows=50 loops=1)
   ->  Nested Loop  (cost=0.00..56297.00 rows=8220 width=2017) (actual 
time=0.084..0.519 rows=50 loops=1)
         ->  Index Scan Backward using oops_oops_prefix_date on oops_oops  
(cost=0.00..12108.20 rows=8236 width=1978) (actual time=0.059..0.153 rows=50 
loops=1)
               Index Cond: (prefix_id = 9)
         ->  Index Scan using oops_infestation_pkey on oops_infestation  
(cost=0.00..5.35 rows=1 width=39) (actual time=0.006..0.006 rows=1 loops=50)
               Index Cond: (id = oops_oops.oopsinfestation_id)
 Total runtime: 0.614 ms

This is much better, with the estimate now being lower than the starting query, 
and
the actual being far far quicker. I can't test this on production, so it may not
have the same effect there, but I think it's worth a try.

Overall I think these changes should significantly improve the render time on 
production,
and if it isn't a sufficient improvement we can see where the problem areas 
remain.

This is my first time trying query optimisation, so I may have made some 
glaring errors,
please point them out if I have.

Thanks,

James
-- 
https://code.launchpad.net/~james-w/python-oops-tools/recent-oopses/+merge/138356
Your team Launchpad code reviewers is requested to review the proposed merge of 
lp:~james-w/python-oops-tools/recent-oopses into lp:python-oops-tools.
=== added file 'src/oopstools/oops/migrations/0023_add_prefix_date_index.py'
--- src/oopstools/oops/migrations/0023_add_prefix_date_index.py	1970-01-01 00:00:00 +0000
+++ src/oopstools/oops/migrations/0023_add_prefix_date_index.py	2012-12-06 01:38:20 +0000
@@ -0,0 +1,92 @@
+# encoding: utf-8
+import datetime
+from south.db import db
+from south.v2 import SchemaMigration
+from django.db import models
+
+
+class Migration(SchemaMigration):
+
+    def forwards(self, orm):
+        
+        # Adding index on 'Oops', fields ['date']
+        db.create_index('oops_oops', ['prefix_id', 'date'])
+
+    def backwards(self, orm):
+        
+        # Removing index on 'Oops', fields ['date']
+        db.delete_index('oops_oops', ['prefix_id', 'date'])
+
+    models = {
+        'oops.appinstance': {
+            'Meta': {'object_name': 'AppInstance'},
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'title': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '100'})
+        },
+        'oops.classification': {
+            'Meta': {'object_name': 'Classification'},
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'title': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '100'})
+        },
+        'oops.dboopsrootdirectory': {
+            'Meta': {'object_name': 'DBOopsRootDirectory'},
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'last_date': ('django.db.models.fields.DateField', [], {'null': 'True'}),
+            'last_oops': ('django.db.models.fields.CharField', [], {'max_length': '200', 'null': 'True'}),
+            'root_dir': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '200'})
+        },
+        'oops.infestation': {
+            'Meta': {'unique_together': "(('exception_type', 'exception_value'),)", 'object_name': 'Infestation'},
+            'bug': ('django.db.models.fields.PositiveIntegerField', [], {'null': 'True'}),
+            'exception_type': ('django.db.models.fields.CharField', [], {'max_length': '100'}),
+            'exception_value': ('django.db.models.fields.CharField', [], {'max_length': '1000'}),
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'})
+        },
+        'oops.oops': {
+            'Meta': {'object_name': 'Oops'},
+            'appinstance': ('django.db.models.fields.related.ForeignKey', [], {'to': "orm['oops.AppInstance']"}),
+            'classification': ('django.db.models.fields.related.ForeignKey', [], {'to': "orm['oops.Classification']", 'null': 'True'}),
+            'date': ('django.db.models.fields.DateTimeField', [], {'db_index': 'True'}),
+            'duration': ('django.db.models.fields.FloatField', [], {'null': 'True'}),
+            'http_method': ('django.db.models.fields.CharField', [], {'max_length': '10', 'null': 'True'}),
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'informational': ('django.db.models.fields.NullBooleanField', [], {'null': 'True', 'blank': 'True'}),
+            'is_bot': ('django.db.models.fields.NullBooleanField', [], {'db_index': 'True', 'null': 'True', 'blank': 'True'}),
+            'is_local_referrer': ('django.db.models.fields.NullBooleanField', [], {'db_index': 'True', 'null': 'True', 'blank': 'True'}),
+            'most_expensive_statement': ('django.db.models.fields.CharField', [], {'max_length': '200', 'null': 'True', 'db_index': 'True'}),
+            'oopsid': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '200'}),
+            'oopsinfestation': ('django.db.models.fields.related.ForeignKey', [], {'to': "orm['oops.Infestation']"}),
+            'pageid': ('django.db.models.fields.CharField', [], {'max_length': '500'}),
+            'pathname': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '200'}),
+            'prefix': ('django.db.models.fields.related.ForeignKey', [], {'to': "orm['oops.Prefix']"}),
+            'referrer': ('django.db.models.fields.URLField', [], {'max_length': '500', 'null': 'True'}),
+            'statements_count': ('django.db.models.fields.PositiveIntegerField', [], {}),
+            'time_is_estimate': ('django.db.models.fields.NullBooleanField', [], {'null': 'True', 'blank': 'True'}),
+            'total_time': ('django.db.models.fields.PositiveIntegerField', [], {}),
+            'url': ('django.db.models.fields.URLField', [], {'max_length': '500', 'null': 'True', 'db_index': 'True'}),
+            'user_agent': ('django.db.models.fields.CharField', [], {'max_length': '200', 'null': 'True'})
+        },
+        'oops.prefix': {
+            'Meta': {'object_name': 'Prefix'},
+            'appinstance': ('django.db.models.fields.related.ForeignKey', [], {'to': "orm['oops.AppInstance']"}),
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'value': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '100'})
+        },
+        'oops.pruneinfo': {
+            'Meta': {'object_name': 'PruneInfo'},
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'pruned_until': ('django.db.models.fields.DateTimeField', [], {})
+        },
+        'oops.report': {
+            'Meta': {'object_name': 'Report'},
+            'active': ('django.db.models.fields.BooleanField', [], {'default': 'False'}),
+            'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),
+            'name': ('django.db.models.fields.CharField', [], {'unique': 'True', 'max_length': '20'}),
+            'prefixes': ('django.db.models.fields.related.ManyToManyField', [], {'to': "orm['oops.Prefix']", 'symmetrical': 'False'}),
+            'recipient': ('django.db.models.fields.EmailField', [], {'max_length': '254', 'null': 'True'}),
+            'summary': ('django.db.models.fields.CharField', [], {'max_length': '20'}),
+            'title': ('django.db.models.fields.CharField', [], {'max_length': '50'})
+        }
+    }
+
+    complete_apps = ['oops']

=== modified file 'src/oopstools/oops/test/test_report.py'
--- src/oopstools/oops/test/test_report.py	2012-12-04 15:59:12 +0000
+++ src/oopstools/oops/test/test_report.py	2012-12-06 01:38:20 +0000
@@ -52,11 +52,13 @@
         infestation = Infestation.objects.create(
                 exception_type=self.getUniqueString(prefix="exc_type"),
                 exception_value=self.getUniqueString(prefix="exc_value"))
+        oopsid = self.getUniqueString(prefix='OOPS-')
+        pathname = self.getUniqueString(prefix='pathname')
         return Oops.objects.create(
                 prefix=prefix, classification=classification,
                 oopsinfestation=infestation, statements_count=100,
                 appinstance=prefix.appinstance, total_time=3,
-                date=datetime.now())
+                date=datetime.now(), oopsid=oopsid, pathname=pathname)
 
     def make_prefix(self):
         appinstance = AppInstance.objects.create(
@@ -75,3 +77,13 @@
         self.assertEqual(200, resp.status_code)
         self.assertQuerysetEqual(
             resp.context['recent'].object_list, [oops.pk], transform=attrgetter('pk'))
+
+    def test_number_of_queries(self):
+        report_name = 'areport'
+        report = Report.objects.create(name=report_name, active=True)
+        prefix = self.make_prefix()
+        report.prefixes.add(prefix)
+        self.make_oops(prefix)
+        self.assertNumQueries(4, self.client.get, '/reports/%s/' % (report_name,))
+        self.make_oops(prefix)
+        self.assertNumQueries(4, self.client.get, '/reports/%s/' % (report_name,))

=== modified file 'src/oopstools/oops/views.py'
--- src/oopstools/oops/views.py	2012-11-29 21:41:40 +0000
+++ src/oopstools/oops/views.py	2012-12-06 01:38:20 +0000
@@ -135,8 +135,10 @@
         dates = []
         for day in range(1, 11):
             dates.append(now - timedelta(day))
+        prefix_ids = list(r.prefixes.values_list('pk', flat=True))
         oopses = Oops.objects.filter(
-                prefix__report=r).order_by('-date')
+                prefix__in=prefix_ids).select_related(
+                    'oopsinfestation').order_by('-date')
         paginator = Paginator(oopses, 50)
         recent_oopses = get_page_from_query_args(paginator, request.GET)
         data = {

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to