#24614: Symmetry for select_related/prefetch_related for handling non-related
fields
----------------------------------------------+--------------------
     Reporter:  Naddiseo                      |      Owner:  nobody
         Type:  Uncategorized                 |     Status:  new
    Component:  Database layer (models, ORM)  |    Version:  master
     Severity:  Normal                        |   Keywords:
 Triage Stage:  Unreviewed                    |  Has patch:  0
Easy pickings:  0                             |      UI/UX:  0
----------------------------------------------+--------------------
 I've been looking at the generated queries from some of our code and
 noticed a few things about select_related and prefetch_related. To start
 with, I want to provide some code examples for reference:

 {{{#!python

 from django.db import models

 class City(models.Model):
     name = models.CharField(max_length = 50)
     population = models.PositiveIntegerField()
     mayor = models.ForeignKey('Person', null=True, default=None)

 class Person(models.Model):
     name = models.CharField(max_length = 50)
     hometown = models.ForeignKey(City)

 def initial_data():
     c1 = City(name='City1', population=2)
     c1.save()
     p1 = Person(name='Citizen One', hometown=c1)
     p1.save()
     p2 = Person(name='Mayor', hometown=c1)
     p2.save()
     c1.mayor = p2
     c1.save(update_fields=['mayor'])
 }}}

 {{{
 # ==== Current behaviour (with abbreviated sql) ====

 # --- Select related at depth=1 ---
 >>> p = Person.objects.select_related('hometown').get(id=1)
 # SELECT p.*,  c.* FROM app_person p INNER JOIN app_city c ON
 p.hometown_id = c.id WHERE p.id=1;
 >>> print(p.hometown.name) # No query
 City1
 >>> print(p.hometown.population) # No query
 2
 >>> print(p.hometown.mayor)
 # SELECT p.* FROM app_person p WHERE p.id = 2
 Person(Mayor)

 # --- Select related on a non-rel field ---
 p = Person.objects.select_related('hometown__name').get(id=1)
 # SELECT p.*,  c.* FROM app_person p INNER JOIN app_city c ON
 p.hometown_id = c.id WHERE p.id=1;
 >>> print(p.hometown.name) # No query
 City1
 >>> print(p.hometown.population) # No query
 2
 >>> print(p.hometown.mayor)
 # SELECT p.* FROM app_person p WHERE p.id = 2
 Person(Mayor)

 # --- Select related on a rel-rel field ---
 p = Person.objects.select_related('hometown__mayor').get(id=1)
 # SELECT p1.*,  c.*,  p2.* FROM app_person p1 INNER JOIN app_city c ON
 p1.hometown_id = c.id LEFT OUTER JOIN app_person p2 ON c.mayor_id = p2.id
 WHERE p1.id = 1
 >>> print(p.hometown.name) # No query
 City1
 >>> print(p.hometown.population) # No query
 2
 >>> print(p.hometown.mayor) # No query
 Person(Mayor)

 # --- Prefetch related on a related field ---
 >>> p = Person.objects.prefetch_related('hometown').get(id=1)
 # SELECT p.* FROM app_person p WHERE p.id = 1
 # SELECT c.* FROM app_city c WHERE c.id IN (...)
 >>> print(p.hometown.name) # No query
 City1
 >>> print(p.hometown.population) # No query
 2
 >>> print(p.hometown.mayor)
 # SELECT p.* FROM app_person p WHERE p.id = 2
 Person(Mayor)

 # --- Prefetch related on a non-rel field ---
 >>> p = Person.objects.prefetch_related('hometown__name').get(id=1)
 # SELECT p.* FROM app_person p WHERE p.id = 1
 # SELECT c.* FROM app_city c WHERE c.id IN (...)
 ValueError: 'hometown__name' does not resolve to an item that supports
 prefetching - this is an invalid parameter to prefetch_related().

 # --- Prefetch related on a rel-rel field ---
 >>> p = Person.objects.prefetch_related('hometown__name').get(id=1)
 # SELECT p.* FROM app_person p WHERE p.id = 1
 # SELECT c.* FROM app_city c WHERE c.id IN (...)
 # SELECT p.* FROM app_person p WHERE p.id = 2
 >>> print(p.hometown.name) # No query
 City1
 >>> print(p.hometown.population) # No query
 2
 >>> print(p.hometown.mayor) # No query
 Person(Mayor)
 }}}

 There are two things in the example I want to bring up:

 The first is that `select_related` and `prefetch_related` differ in how
 they handle their arguments if they're invalid (ie, when they are pointing
 to non-related fields), `select_related` will happily accept the argument
 and just fetch the table parts, whereas `prefetch_related` will throw a
 `ValueError`. I think both methods should either throw the `ValueError`,
 or both sanitize the input :- by the same reason given in
 > django/db/models/query.py line ~1508
 > Last one, this *must* resolve to something that supports prefetching,
 otherwise there is no point adding it and the developer asking for it has
 made a mistake."

 The second is that I believe there is some semantic overloading going on
 for the argument format for
 `prefetch_related`/`select_related`/`only`/`defer`, I think it's best
 explained in a table:

 || `"model__attr"`                  ||= `prefetch_related`
 =||= `select_related`                                           =||=
 `only/defer` =||= `only/defer` + `select_related` =||
 ||attr is Related field           ||Extra query (expected), selects all
 fields||Joins field (expected), selects all fields      ||affects what is
 in the select clause (expected), but ignores "attr" and only selects
 "model"  ||affects what is in the select clause (expected), and only
 selects "attr" from "model" (expected) ||
 ||attr is non-relational field||ValueError
 ||Ignores the field, selects all fields on table ||affects what is in the
 select clause (expected), but ignores "attr" and only selects "model"  ||
 affects what is in the select clause (expected), and only selects "attr"
 from "model" (expected) ||

 There seems to be a two semantic overlap issues. First is with the name of
 `select_related` in that it doesn't really affect which fields are
 "selected", only really which tables are joined, second is the semantic
 overlap between the argument formats which `selected_related` accepts and
 which formats `only/defer` accept. The main problem I perceive is that
 both accept arguments in the format `"model__attr"` yet treat them
 differently. On one hand, `prefetch_related` and `select_related` are
 supposed to only deal with arguments where `"__attr"` is a relational
 field (prefetch does correctly), on the other hand, `only/defer` accept
 arguments where `"__attr"` is both a relational field, or just a normal
 attribute field. To me, at least, these are two distinct data types, yet
 when I'm looking through code they are represented in exactly the same
 manner. Obviously, there isn't any other practical way to differentiate
 the two data types, so there should probably be changes made to how
 they're handled.

 Originally, I was going to propose that if a non-relational field was
 detected in `prefetch_related` or `select_related`, then it is passed to
 an `only()`, since when I see the name "select_related", I associate it
 with the fields "SELECT"ed, thus `A.objects.select_related('b__attr')`
 would behave the same as `A.objects.select_related('b').only('b__attr',
 'aAttr1', 'aAttr2').defer('b__otherattr')`, however, after trying to
 reason about it, I'm not sure it's a good idea since it further
 exacerbates the symmetry issues.
 Pros:
 - More succinct code for optimizing which fields are selected
 - `select_related` accepts the same field types as `only/defer`
 - Allows more fine grained control to which fields are `SELECT`ed (fields
 in the `select_related` will be the ones actually selected for those
 related tables)
 Cons/Concerns:
 - Behaviour is less like `prefetch_related`
 - Possibly breaking change for `select_related` on relational fields.
 - Doesn't really make sense for `prefetch_related`, thus the symmetry
 issue between `prefetch_related` and `select_related`.
 - What is selected for `A.objects.select_related('b__c')` where `c` is
 also a related field? Is it all fields on `c` or just its id?

 Perhaps a better approach, although probably most controversial, would be
 to rename `select_related` to `join_related` and have it throw a
 `ValueError` on non-relational fields. This would resolve the semantic
 overloading of the method name and implications of its implementations,
 however, it doesn't resolve the second - although admittedly more minor -
 the previously mentioned argument/data types/semantics issue.

 Anyway, I'm hoping a discussion can be started on this, but at very
 minimum I think `select_related` should match the behaviour of
 `prefetch_related` when it encounters an argument that does not resolve to
 a relational field. Let me know if a separate issue should be reported.

--
Ticket URL: <https://code.djangoproject.com/ticket/24614>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/051.db327d7bb2e063f06dde68e92d805cbd%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to