My web app hits the database up to 3k per page because it has to retrieve
data from lots of related models. It has a fairly normalised database with
a structure similar to:
- one project has many contracts
- one project has many tenders (a tender is put forward by a contractor)
- one contract has many jobs
(there are a few extra relations not shown for simplicity). One of the
pages, for example, lists out details for every contract a contractors has
tendered). I've naively implemented this by treating the Django models as I
would OO models, resulting in about 3k database hits for the one page. This
is because something like:
qs = Tender.objects.filter(filter_criteria)
for tender in qs:
print tender.getProject()
where getProject() looks something like:
def getProject(self):
return self.project
I think here that the extra level of indirection means that Django can't
conclude that all 'project' objects for each tender from the query set
should be retrieved, and instead it ends up hitting the database once per
loop. At the moment I do this in quite a few places, contributing to the 3k
database hits.
One way of of solving this is for the query to fetch all related data,
something like:
qs = Tender.objects.sql_related(/*names of keyed relationships
here*/).prefetch_related('project', 'project__job_set') (the actual example
has many more relations in the prefetch_related). This fixes the problem of
hitting the database in loops such as the above, and the complexity can be
hidden in a data retrieval layer that the client calls into.
returnedTenders = getTendersForContractor(contractor) # << the actual query
with prefetches etc is done in this function
However, there are a few drawbacks to this:
* Major one is that data is fetched that you might not require. EG: if
there was another page that just listed the ids of the Tenders, then a lot
of data would have been fetched that is not required
** you could mitigate this by having the client somehow specify which data
requires prefetching, but
*** this exposes implementation detail to the client
*** the function calling into the data retrieval layer may not know what
data requires prefetching, and instead it is a function higher up in the
call chain (the function that actually uses the data) that has this
information
* Each prefetch_related hits the database once - a lot better than 3k hits,
but not as minimal as a sql join which joins all of the tables
* Prefetch_related selects related obects via the criteria: 'in (list of
ids'), which arguably can be slower
* Seems brittle - if you change the data model, you have to change all of
the queries in the data retrieval layer. Denomralising the data would
reduce this, but we'd still have the problem of retrieving more data than
the client may possibly want
What is the group's suggestions/advice?
--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.