#34728: OR operator on queryset does not work as expected
-------------------------------------+-------------------------------------
Reporter: Kbleser | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: needsinfo
Keywords: queryset OR | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):
TL;DR not an issue with queryset combination but another ticket that
relates to how the ORM deals with aggregation when more than one multi-
valued relationship is involved. Either ''invalid'' or ''duplicate'' to
me.
---
> As you can see on the main branch of the repository linked above by the
reporter, the query works as expected if it is rewritten to use Q objects
This happens to work because you perform the aggregation ''before''
filtering the multi-valued relationship which prevents JOIN reuse
[https://docs.djangoproject.com/en/4.2/topics/db/aggregation/#order-of-
annotate-and-filter-clauses as documented] and results in you joining
twice against against the same multi-valued relationship
{{{#!sql
SELECT "recipes_recipe"."id",
"recipes_recipe"."title",
"recipes_recipe"."date_posted",
"recipes_recipe"."created_by_id",
"recipes_recipe"."image",
"recipes_recipe"."preparation_time_in_minutes",
"recipes_recipe"."instructions",
"recipes_recipe"."level",
"recipes_recipe"."serving_size",
COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS
"shared_ingredients"
FROM "recipes_recipe"
LEFT OUTER JOIN "recipes_ingredienttorecipe" ON ("recipes_recipe"."id" =
"recipes_ingredienttorecipe"."recipe_id_id")
INNER JOIN "recipes_ingredienttorecipe" T4 ON ("recipes_recipe"."id" =
T4."recipe_id_id")
LEFT OUTER JOIN "recipes_tagtorecipe" ON ("recipes_recipe"."id" =
"recipes_tagtorecipe"."recipe_id_id")
WHERE (T4."ingredient_id_id" IN
(SELECT U0."id"
FROM "recipes_ingredient" U0
INNER JOIN "recipes_ingredienttorecipe" U1 ON (U0."id" =
U1."ingredient_id_id")
WHERE U1."recipe_id_id" = 1)
AND NOT ("recipes_recipe"."id" = 1))
GROUP BY "recipes_recipe"."id",
"recipes_recipe"."title",
"recipes_recipe"."date_posted",
"recipes_recipe"."created_by_id",
"recipes_recipe"."image",
"recipes_recipe"."preparation_time_in_minutes",
"recipes_recipe"."instructions",
"recipes_recipe"."level",
"recipes_recipe"."serving_size",
"recipes_tagtorecipe"."tag_id_id"
HAVING (("recipes_tagtorecipe"."tag_id_id" IN
(SELECT U0."id"
FROM "recipes_tag" U0
INNER JOIN "recipes_tagtorecipe" U1 ON (U0."id" =
U1."tag_id_id")
WHERE U1."recipe_id_id" = 1)
AND COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 2)
OR COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4);
}}}
Notice how `recipes_ingredienttorecipe` is joined twice which means
`COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS
"shared_ingredients"` will actually be the `COUNT` of the
[http://charettes.name/djangoconus2022/slides.html#21 product of rows]
`2X2=4` (see #10060). In other words, aggregate annotations when many
multi-valued relationships are involved are broken and have been for while
and in your case it's the
`COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4)` that is
allowing your test to pass on `main`. The fact this problems manifests
itself in queryset combination when filtering against broken aggregate
annotation is just a symptom of a much larger problem.
If you define one of your aggregate annotations as a subquery to avoid
joining more than one multi-valued relationship at once (#28296) things
properly work
{{{#!python
def find_similar_recipes(self):
return (
Recipe.objects.annotate(
shared_ingredients_cnt=Count(
"ingredients",
filter=Q(ingredients__in=self.ingredients.all()),
),
# Using a subquery to target the _other_ multi-valued
relationship avoids X product of rows
# with ingredients
shares_tag=Exists(
TagToRecipe.objects.filter(
tag_id__in=self.tags.all(),
recipe_id=OuterRef("pk"),
)
),
)
.filter(
Q(shared_ingredients_cnt__gte=4)
| Q(shares_tag=True, shared_ingredients_cnt__gte=2)
)
.exclude(id=self.id)
)
}}}
Which can also use queryset combination without issues
{{{#!python
def find_similar_recipes(self):
base = Recipe.objects.annotate(
shared_ingredients_cnt=Count(
"ingredients",
filter=Q(ingredients__in=self.ingredients.all()),
)
).exclude(id=self.id)
same_tag_and_at_least_two_shared_ingredients = base.annotate(
shares_tag=Exists(
TagToRecipe.objects.filter(
tag_id__in=self.tags.all(),
recipe_id=OuterRef("pk"),
)
)
).filter(Q(shares_tag=True, shared_ingredients_cnt__gte=2))
at_least_four_shared_ingredients =
base.filter(shared_ingredients_cnt__gte=4)
return (
same_tag_and_at_least_two_shared_ingredients |
at_least_four_shared_ingredients
)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34728#comment:14>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/010701899a2003f3-54853793-f8ac-4829-8740-a5bba867f89d-000000%40eu-central-1.amazonses.com.