Re: Reducing the number of my view's database queries

erikcw Fri, 12 Dec 2008 10:29:50 -0800

Thanks for all of the responses.  Just wanted to share where I'm at at
this point:


self.profilekeyword_set.select_related().all().extra(select={
        'rank_url': 'SELECT url FROM rankreport_keywordrank WHERE
rankreport_keywordrank.keyword_id =
rankreport_profilekeyword.keyword_id AND
rankreport_keywordrank.site_id = rankreport_profile.site_id ORDER
BY `created` DESC LIMIT 0,1',        'competition': 'SELECT
competition FROM rankreport_keywordrank WHERE
rankreport_keywordrank.keyword_id =
rankreport_profilekeyword.keyword_id AND
rankreport_keywordrank.site_id = rankreport_profile.sit
e_id ORDER BY `created` DESC LIMIT 0,1',
        'rank': 'SELECT rank FROM rankreport_keywordrank WHERE
rankreport_keywordrank.keyword_id =
rankreport_profilekeyword.keyword_id AND
rankreport_keywordrank.site_id = rankreport_profile.site_id ORDER BY
`created` DESC LIMIT 0,1',
        'start_rank': 'SELECT rank FROM rankreport_keywordrank WHERE
rankreport_keywordrank.keyword_id =
rankreport_profilekeyword.keyword_id AND
rankreport_keywordrank.site_id = rankreport_profile.site_id ORD
ER BY `created` ASC LIMIT 0,1',
        'avg': 'SELECT search_volume_avg FROM
rankreport_keywordmetrics WHERE rankreport_keywordmetrics.keyword_id =
rankreport_profilekeyword.keyword_id ORDER BY `created` DESC LIMIT
0,1',
        'cpc': 'SELECT cpc FROM rankreport_keywordmetrics WHERE
rankreport_keywordmetrics.keyword_id =
rankreport_profilekeyword.keyword_id ORDER BY `created` DESC LIMIT
0,1',
        'zz': 'start_rank - rank',
        }).select_related()


This seems to work.  If you know of any performance tweaks I'd love to
hear about them (however it is incredibly fast compared to my original
solution).

The only problem is that "zz" throws a database error.
(OperationalError: (1054, "Unknown column 'start_rank' in 'field
list'")) - ant ideas how I can get "zz" working?

Thanks again for the assistance!
Erik

On Dec 8, 4:49 pm, Malcolm Tredinnick <[email protected]>
wrote:
> On Mon, 2008-12-08 at 07:43 -0800, DavidA wrote:
> > If I undestand the problem correctly, in MySQL you could do this in
> > one query as:
>
> > select
> >     m.*,
> >     (select min(created) from model2 where id = m.model2_id) as
> > first_created,
> >     (select max(created) from model2 where id = m.model2_id) as
> > last_created
> > from model1 m
> > ;
>
> That will give the correct answer, however it's a bit inefficient, since
> the inner queries are run once for each model id (and in the original
> post model2 and model3 were used, so it's worse than it looks here).
> Since that means 6000 rows, based on Eric's original numbers, it's worth
> looking deeper. Let me cut-and-paste the original description for
> reference:
>
> (behold the power of a fully-functional, non top-posting email client!)
>
> > > > objects = Model1.objects.filter(user=3).select_related() #about 6,000 
> > > > objects
>
> > > > data = {}
> > > > for o in objects:
> > > >     data[o.name] = [o.field1, o.field2]
> > > >     data[o.name].append(o.field3.model2_set.all().latest('created'))
> > > > #get latest row from related model2
> > > >     data[o.name].append(o.model3_set.all().order_by('created')[0])
> > > > #get earliest row from related model3
>
> Ideally, there would be a way to run one inner query that does something
> like
>
>         select max(created) from ... where model1.user_id = 3 group by
>         model1.id
>
> except that isn't quite right because if the outer query only selected
> results with those created dates, it will get a superset of the right
> answer (some model2.id could have a created date that matches some other
> maximum but isn't the maximum for that particular model2.id, so you need
> to tie the maximum date to the model id).
>
> If something along those lines can be made to work, that inner query
> only has to run once (sure it's a big query, with a sort, but it's only
> one query and it's in the sweet spot of what databases can do). It's the
> difference between a correlated inner query (which you have -- so it
> runs once for each outer row) and an uncorrelated query like the above.
>
> Still, using the above query and then filtering out the dupes at the
> Python level might well be a good alternate solution. So you end up
> with:
>
>         select model2.*, model1.id, ... from ...
>         where model2.created in (select max(model2.created) from ....
>         where model1.user_id=3 group by model1.id) order by
>         model2.created asc
>
> Then you process these results, storing them in a dictionary keyed from
> model1.id. When you get duplicate results for the same model1.id (the
> above superset situation), the 'latest' result will be after the earlier
> ones and will overwrite them.
>
> The above is the query to select all the model2 fields. There's another
> query to select all the model3 fields. So that reduces the original
> 12001 queries to 3 and in the two with nested inner queries, the nested
> queries only run once each (or once per user id, which seemed to be
> fixed in the original problem statement).
>
> You could fairly close to doing this through the ORM as well, although
> it's not clear that it's worth it. Even with aggregate support, here's
> one approach (I'll only worry about the two model situation; extending
> to three is analogous). I'll assume models that look like this (and I'm
> going to change the names to something comprehensible so that we can
> develop some intuition about what's going on):
>
>         class Blog(models.Model):
>            author = models.ForeignKey(User)
>
>         class Entry(models.Model):
>            blog = models.ForeignKey(Blog)
>            created = models.DateTimeField(default=datetime.datetime.now)
>
> So each user has many blogs with many entries and we want to pull the
> latest entries for each blog for a particular user (user_id 3, in this
> case). The inner query and grouping stuff here requires poking at the
> internal API (although the inner query bit is, sadly, documented):
>
>         blog_qs = Blog.objects.filter(user=3)
>
>         inner_query = Entry.objects.filter(blog__user=3).extra(select={"max": 
> "max(created)"}).values("max").query
>         inner_query.group_by = ["blog_id"]
>         entry_qs = 
> Entry.objects.filter(created__in=inner_query).order_by("created")
>
>         # Now merge the results into a dictionary keyed by blog_id.
>         data = dict([(b.id, b) for b in blog_qs])
>         for entry in entry_qs:
>            data[entry.blog_id] = entry
>
> Regards,
> Malcolm
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Reducing the number of my view's database queries

Reply via email to