#5420: Allow database API users to specify the fields to exclude in a SELECT
statement
---------------------------------------------------+------------------------
Reporter: adrian | Owner: jacob
Status: assigned | Milestone: post-1.0
Component: Database layer (models, ORM) | Version: SVN
Resolution: | Keywords: qs-rf
Stage: Accepted | Has_patch: 1
Needs_docs: 1 | Needs_tests: 0
Needs_better_patch: 1 |
---------------------------------------------------+------------------------
Changes (by adunar):
* cc: [EMAIL PROTECTED] (added)
* needs_docs: 0 => 1
Comment:
A few months ago I patched Apture's internal version of Django to support
lazy loading/saving of certain fields within Django models. However, I did
it a different way than already discussed here.
In particular, our problem was that implicit Django-generated query sets
would fetch big text fields that we didn't need. Example :
{{{
#!python
class Student(models.Model):
name = models.CharField(max_length=32)
year = models.IntegerField()
thesis = models.TextField()
class FavoriteFood(models.Model):
food = models.CharField(max_length=32)
reason = models.CharField(max_length=128)
student = models.ForeignKey(Student)
favorites = FavoriteFood.objects.filter(food='lasagna')
for favorite in favorites:
print favorite.student.name,
# Django just loaded the student's entire thesis :(
print "likes lasagna because", favorite.reason
favorites = FavoriteFood.objects.filter(food='chicken
enchiladas').select_related()
# Django just loaded a bunch of theses again :(
}}}
To solve this, we changed the client interface by adding a boolean 'lazy'
parameter to the Field constructor, e.g.:
{{{
#!python
thesis = models.TextField(lazy=True)
}}}
This was implemented by putting a descriptor on lazy fields that keeps
track of whether the field has been loaded yet and whether it has been
modified. By using a descriptor instead of !__setattr!__, it doesn't
really have an impact on performance for models that don't use lazy
fields.
After we migrated our internal Django version to 1.0, I went back and
cleaned up the lazy fields code and added support for changing the lazy
fields on each query set. This turned out to be considerably harder to do
in a way that doesn't degrade performance for clients who don't use lazy
fields, which is probably why this ticket has been open for so long
despite its obvious importance...
The client interface adds one function to the manager and query set,
toggle_fields(fetch=None, lazy=None, fetch_only=None), where each argument
can be an array of field names (or None):
{{{
#!python
# fetches name (assuming thesis was defined with lazy=True)
students = Student.objects.all().toggle_fields(lazy=['year'])
# fetches name,year,thesis
students2 = Student.objects.all().toggle_fields(fetch=['thesis'])
# fetches name, year
students3 =
Student.objects.all().toggle_fields(fetch_only=['name','year'])
thesis = students[0].thesis # lazy-loads thesis
students[0].save() # saves name, year
students[0].thesis = "Django is awesome"
students[0].save() # saves name, year, thesis
}}}
Does anyone have ideas for better names for
toggle_fields/lazy/fetch/fetch_only? I think that hide and show/expose
don't really fit here because the client can still get and set and save
the field values whether it is lazy or not. Also, I thought that having
one method with different parameters would follow Django's style better
than adding 3 new methods. Another question is whether to allow lazy-
loading and lazy-saving to be independent; e.g., to have a field that's
always loaded but only saved when changed. It probably wouldn't be too
hard to support this.
Internally, when a toggle_fields query is executed, the Django ORM
dynamically creates a subclass of the model type that has a
!LazyDescriptor for each of the fields that are lazy for that query (but
not the fields that were created with lazy=True). Unlike a typical model
subclass, this one is very lightweight. It skips most of the code in
!ModelBase.!__new!__, and shares the same _meta object.
One drawback of dynamically creating subclasses is that they are harder to
serialize (e.g. with the pickle module), but that's probably possible to
support if desired.
Anyway, there shouldn't be much of a performance impact for people not
using toggle_fields or lazy=True. In most cases the code checks to see if
there are any lazy fields before doing anything different from before.
There's just a bit of overhead from some extra conditional tests and
function calls. I haven't actually run performance tests on that though.
As part of my patch, I cleaned up some code in db/models/sql/query.py. In
particular, the code to get a column's SQL alias was duplicated in 6
places. Also, depending on whether the get_default_columns function was
called for the base model or a related model, it took different
parameters, did different things, and returned different values. So I
split it into two separate functions. This refactoring makes the Query
class easier to understand and easier to subclass without duplicating
code. That seems like something Django should incorporate even if people
don't like the other code in this patch.
One change I'm less sure about is in Query.setup_joins. Basically, when
computing the join for a !ForeignKey field like student_id, the old code
would add a join to the student table even though it doesn't need to
(because student_id is stored directly in the original row). Then,
add_fields checked for this case and removed the join. I changed it so
that setup_joins doesn't add the unnecessary join in the first place.
Maybe there's some reason for doing this that I'm not seeing.
Anyway, we're already using this because it makes database access way
faster for models that have big text fields that we don't read or write
very often. My patch has some regression tests, but I haven't updated the
documentation yet. If this is something Django would want to incorporate,
I'd be happy to do some more work to improve the patch. Thoughts?
--
Ticket URL: <http://code.djangoproject.com/ticket/5420#comment:21>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---