Re: Performance when retrieving data from related models

Peter of the Norse Mon, 18 Mar 2013 19:09:34 -0700

More inline stuff. 

On Mar 18, 2013, at 4:14 AM, Taras D wrote:


> Hi Peter
> 
> On 18 March 2013 16:35, Peter of the Norse <[email protected]> wrote:
> You’re over thinking this by a large margin. getTendersForContractor should 
> return a QuerySet. That way you can add things like .select_related as 
> necessary in the view. 
> 
> Also, I’m a bit worried about the fact you have a function called 
> getTendersForContractor. It really should be Tender.objects.forContractor. Or 
> if the function is just a single filter call, it can be eliminated entirely.
> 
> To confirm, this would require a new Manager that implements the 
> 'forContractor' method?

Yes. 

>> * Prefetch_related selects related obects via the criteria: 'in (list of 
>> ids'), which arguably can be slower
> 
> What database are you using where this is true?
> 
> It was the result of a discussion with a colleague. Also slides 67 & 68 at 
> http://www.slideshare.net/OReillyOSCON/unbreaking-your-django-application 
> imply that large IN's may cause performance problems - 
> "68:  Large INs are a mess.  Very expensive for the database to parse.  Very 
> expensive for the database to execute.  If there are potentially more than 
> 10-15 items in the list, rework the IN as a JOIN against whatever the source 
> of the keys is." 

That sounds like a good candidate for a bug ticket. It’s not trivial though. 
The limit of IDs in the query set would depend on the database, the complexity 
of the query, and the data being pulled. I would have to think about this for a 
long time before I felt comfortable making any kind of suggestion.

>> * Seems brittle - if you change the data model, you have to change all of 
>> the queries in the data retrieval layer. Denomralising the data would reduce 
>> this, but we'd still have the problem of retrieving more data than the 
>> client may possibly want
> Yes, if you change code in one place, you might have to change it someplace 
> else to match it. Most of us just use the ORM as the data retrieval layer. 
> Again, I’m worried you’re over thinking this.
> 
> I hadn't really considered passing around a QuerySet in place of a list of 
> Tenders. One thing I'm finding is that the cost in retrieving related objects 
> is making me rethink the API presented to the client. For example, consider 
> the following API, which I think is quite reasonable:
> 
> class Tender:
>       def getTotalCost(self):
>           return self.project.getTotalCost()
> 
> class Project:
>       
>       def getTotalCost(self):
>           cost = 0
>           for contract in self.contract_set.all():
>               cost += constract.getTotalCost()
>           return cost
> 
> similarly implementations for Project & Contract
> 
> class Contract:
>       def getTotalCost(self):
>           cost = 0
>           for job in self.job_set.all():
>               cost += job.getTotalCost()
>           return cost

https://docs.djangoproject.com/en/dev/topics/db/aggregation/

> Having a Project know how much its total cost is, and presenting this to a 
> client, is in line with OO principles of abstraction and encapsulation. 
> Consider a page that outputs the total cost of each contract tendered for a 
> particular contractor. If we re-use the 'getTendersForContractor' function 
> (that returns a QuerySet), and then call 'getTotalCost' on each Project that 
> has a tender, then we still have a problem in that we'll get a multitude of 
> calls to the database. 
> 
> We might put the onus on the client to call:
> qs = getTendersForContractor(contractor)
> for tender in list(qs.prefetch_related('project', 'project__contract', 
> 'project__contract')):
>     print tender, tender.getTotalCost()

If this is really what you are doing, then I strongly recommend 
https://docs.djangoproject.com/en/dev/topics/db/aggregation/. This case is a 
single SQL statement.

> Not only is this complicated, but this requires the client know how the 
> Tender is implemented (the prefetched would become uneccessary if we changed 
> the implementation to store the total cost on the actual Tender itself at the 
> cost of denormalised data). Note that you wouldn't have these problems if the 
> data existed in memory.
> 
> These points are making me rethink my data model and API design, and that the 
> OO paradigm doesn't always fit well with web apps. In particular the slides 
> at http://www.slideshare.net/OReillyOSCON/unbreaking-your-django-application:
> 
> 23: "Django doesn’t have a good primitive for returning an object graph in 
> one call. - multiple db trips required" Implies that object graphs like what 
> I've designed is difficult to implement effeciently..?
> 
> 24: "So, what do to?  If you almost always return the parent along with the 
> related object, consider object inheritance instead.  That does do a nice 
> join operation.  Don’t create an insane class graph, though.  If you 
> frequently use the related object without reference to the parent object, use 
> a foreign key." Not sure what 'object inheritance' is but might be useful, 
> and concerned that I might be heading towards an 'insane class graph' (the 
> real data model has 6 or 7 more related objects). Am I heading down the wrong 
> path of embracing an OO style API and data model?
> 
> Taras 

I don’t know. But I think you have to acknowledge that there is such a thing as 
too OOP. I’m not sure that creating an API on top of Django is a good idea. 
Django is already super high level. I would have to ask myself how I expect my 
users to use what I give them. On some level, you’ll have to trust your user to 
ask for the data they need.

Peter of the Norse
[email protected]



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Performance when retrieving data from related models

Reply via email to