#28422: Allow adding joins to other querysets (or models) to a queryset with 
extra
join conditions
--------------------------------+--------------------------------------
     Reporter:  Debanshu Kundu  |                    Owner:  nobody
         Type:  Uncategorized   |                   Status:  new
    Component:  Uncategorized   |                  Version:  1.11
     Severity:  Normal          |               Resolution:
     Keywords:                  |             Triage Stage:  Unreviewed
    Has patch:  0               |      Needs documentation:  0
  Needs tests:  0               |  Patch needs improvement:  0
Easy pickings:  0               |                    UI/UX:  0
--------------------------------+--------------------------------------
Description changed by Debanshu Kundu:

Old description:

> In one of our project we had a need to join our sub-queries to our main
> query. As Django ORM doesn't support this we had to write SQL queries.
> But after some time those SQL queries became difficult to maintain and
> our project was under active development and we were doing
> additions/changes to models and query logic.
>
> So we started to look for alternatives and found some ways to hack Django
> ORM using which we can add joins to queryset. I have created this gist
> with utility functions and related helper code which are we using to add
> join to another queryset (or model) to a queryset:
> https://gist.github.com/debanshuk/6fd9398cff0fab59e7093fe98b8a9152.
>
> These functions are named `join_to_queryset()` and `join_to_table()`
> respectively. They also allow adding extra conditions to the join added
> by them using `get_active_extra_restriction()` helper function.
>
> Following is an example showing use of `join_to_queryset()` function:
>
> {{{
> class Snake(models.Model):
>     name = models.TextField()
>     age = models.PositiveIntegerField()
>     length = models.FloatField()
>     sex = models.TextField()
>
> class Egg(models.Model):
>     snake = models.ForeignKey(Snake)
>
> class Kill(models.Model):
>     snake = models.ForeignKey(Snake)
>
> queryset = Snake.objects.filter(pk__in=snake_pks)
>
> queryset = join_to_queryset(
>     table=Snake,
>     subquery=Egg.objects.values('snake').annotate(egg_count=Count('pk')),
>     table_field='id',
>     subquery_field='snake_id',
>     queryset=queryset,
>     alias='SnakeEggAggr'
> ).extra(select={'egg_count': 'SnakeEggAggr.egg_count'})
>
> queryset = join_to_queryset(
>     table=Snake,
> subquery=Kill.objects.values('snake').annotate(kill_count=Count('pk')),
>     table_field='id',
>     subquery_field='snake_id',
>     queryset=queryset,
>     alias='SnakeKillAggr'
> ).extra(select={'kill_count': 'SnakeKillAggr.kill_count'})
>
> print queryset.values('name', 'age', 'length', 'sex', 'egg_count',
> 'kill_count')
> }}}
>
> Output of above code would be something like:
> {{{
> [{'name': 'John', 'age': 5, 'length': 20.1, 'sex': 'male', 'egg_count':
> 10, 'kill_count': 5}, {'name': 'Jane', 'age': 8, 'length': 25.5, 'sex':
> 'female', 'egg_count': 5, 'kill_count': 1}, {'name': 'Jack', 'age': 2,
> 'length': 10.2, 'sex': 'trans', 'egg_count': 0, 'kill_count': 0}]
> }}}
>

> Above result can also be obtained by doing following query:
>
> {{{
> print
> Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
> kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
> 'egg_count', 'kill_count''egg_count', 'kill_count')
> }}}
>
> But this query will take more time to execute than previous one as
> 'name', 'age', 'length' and 'sex' all four fields would be in the GROUP
> BY clause of SQL query and the time will increase more and more as the
> number of such fields increases (this is the vary reason due to which we
> had to use sub-queries for aggreagation).
>
> It would be nice if such functionality can be added to the Django ORM
> itself. It seems doable to as we were able to hack the ORM to do the
> same.

New description:

 In one of our project we had a need to join our sub-queries to our main
 query. As Django ORM doesn't support this we had to write SQL queries. But
 after some time those SQL queries became difficult to maintain and our
 project was under active development and we were doing additions/changes
 to models and query logic.

 So we started to look for alternatives and found some ways to hack Django
 ORM using which we can add joins to queryset. I have created this gist
 with utility functions and related helper code which are we using to add
 join to another queryset (or model) to a queryset:
 https://gist.github.com/debanshuk/6fd9398cff0fab59e7093fe98b8a9152.

 These functions are named `join_to_queryset()` and `join_to_table()`
 respectively. They also allow adding extra conditions to the join added by
 them using `get_active_extra_restriction()` helper function.

 Following is an example showing use of `join_to_queryset()` function:

 {{{
 class Snake(models.Model):
     name = models.TextField()
     age = models.PositiveIntegerField()
     length = models.FloatField()
     sex = models.TextField()

 class Egg(models.Model):
     snake = models.ForeignKey(Snake)

 class Kill(models.Model):
     snake = models.ForeignKey(Snake)

 queryset = Snake.objects.filter(pk__in=snake_pks)

 queryset = join_to_queryset(
     table=Snake,
     subquery=Egg.objects.values('snake').annotate(egg_count=Count('pk')),
     table_field='id',
     subquery_field='snake_id',
     queryset=queryset,
     alias='SnakeEggAggr'
 ).extra(select={'egg_count': 'SnakeEggAggr.egg_count'})

 queryset = join_to_queryset(
     table=Snake,
 subquery=Kill.objects.values('snake').annotate(kill_count=Count('pk')),
     table_field='id',
     subquery_field='snake_id',
     queryset=queryset,
     alias='SnakeKillAggr'
 ).extra(select={'kill_count': 'SnakeKillAggr.kill_count'})

 print queryset.values('name', 'age', 'length', 'sex', 'egg_count',
 'kill_count')
 }}}

 Output of above code would be something like:
 {{{
 [{'name': 'John', 'age': 5, 'length': 20.1, 'sex': 'male', 'egg_count':
 10, 'kill_count': 5}, {'name': 'Jane', 'age': 8, 'length': 25.5, 'sex':
 'female', 'egg_count': 5, 'kill_count': 1}, {'name': 'Jack', 'age': 2,
 'length': 10.2, 'sex': 'hermaphrodite', 'egg_count': 0, 'kill_count': 0}]
 }}}


 Above result can also be obtained by doing following query:

 {{{
 print
 Snake.objects.filter(pk__in=snake_pks).annotate(egg_count=Count('egg__id'),
 kill_count=Count('kill__id')).values('name', 'age', 'length', 'sex',
 'egg_count', 'kill_count''egg_count', 'kill_count')
 }}}

 But this query will take more time to execute than previous one as 'name',
 'age', 'length' and 'sex' all four fields would be in the GROUP BY clause
 of SQL query and the time will increase more and more as the number of
 such fields increases (this is the vary reason due to which we had to use
 sub-queries for aggreagation).

 It would be nice if such functionality can be added to the Django ORM
 itself. It seems doable to as we were able to hack the ORM to do the same.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28422#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/067.b8c5ab666da82279998651bfa0875b8d%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to