#35099: Combining QuerySets with "|" or "&" produce side effects affecting
further
queries
-------------------------------------+-------------------------------------
Reporter: Alan | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 5.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Alan:
Old description:
> Hello everyone.
>
> Combining some queries with "|" or "&" somehow affects queries involved
> in the operation, leading to malformed SQL and unexpected results.
>
> Here are details and steps to reproduce. Apologise, for maybe a bit
> confusing model names, I copied them from production.
> {{{
> class SiteUser(models.Model):
> pass
>
> class Notification(models.Model):
> user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
>
> class PayoutRequest(models.Model):
> requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
> }}}
>
> Test:
> {{{
> from django.test import TestCase
> from django.db.models import OuterRef, Exists
> from reproduce.models import Notification, SiteUser, PayoutRequest
>
> class Reproduce(TestCase):
>
> def test(self):
> u01 = SiteUser.objects.create()
> u02 = SiteUser.objects.create()
> u03 = SiteUser.objects.create()
>
> Notification.objects.create(user=u01)
> PayoutRequest.objects.create(requester=u01)
> Notification.objects.create(user=u02)
> PayoutRequest.objects.create(requester=u03)
>
> are_active = SiteUser.objects.all().distinct()
> got_money = SiteUser.objects.filter(
> Exists(PayoutRequest.objects.filter(requester=OuterRef('pk')))
> ).distinct()
> whatever_query = SiteUser.objects.all().distinct()
>
> # Execute queries first time
> need_help = are_active.exclude(pk__in=got_money)
> notified =
> Notification.objects.filter(user__in=need_help).values_list('user_id',
> flat=True)
> query_before = str(notified.query)
>
> self.assertEqual(len(notified), 1) # correct
>
> whatever_query | got_money # Touch "got_money" with any other
> query
>
> # Execute same queries second time
> need_help = are_active.exclude(pk__in=got_money)
> notified =
> Notification.objects.filter(user__in=need_help).values_list('user_id',
> flat=True)
> query_after = str(notified.query)
>
> print(query_before)
> print(query_after)
> self.assertEqual(len(notified), 1) # expected 1, got 0
> self.assertEqual(query_before, query_after) # false
> }}}
>
> As you can see, merely touching the `got_money` query with any other
> query leads to modifying the results of the same queries executed after
> that.
> This test case probably may be simplified even further, but
> unfortunately, I have no more time resources to dig much deeper.
>
> I had another queries built using simple .filter() and .exclude(). Those
> were not affected by combining.
> I found only this query `got_money` using `Exists()` and `OuterRef()` to
> be affected. There might be more of which I am not aware of.
>
> The reason for this I don't know, but `query_before` and `query_after`
> differs.
> `query_before` correctly separates subqueries using W0, U0, V0 aliases,
> while the `query_after` uses a single U0 alias for all subqueries,
> leading to incorrect results.
>
> Before
> {{{
> SELECT
> "reproduce_notification"."user_id"
> FROM
> "reproduce_notification"
> WHERE
> "reproduce_notification"."user_id" IN (
> SELECT
> DISTINCT W0."id"
> FROM
> "reproduce_siteuser" W0
> WHERE
> NOT (
> W0."id" IN (
> SELECT
> DISTINCT V0."id"
> FROM
> "reproduce_siteuser" V0
> WHERE
> EXISTS(
> SELECT
> 1 AS "a"
> FROM
> "reproduce_payoutrequest" U0
> WHERE
> U0."requester_id" = (V0."id")
> LIMIT
> 1
> )
> )
> )
> )
> }}}
>
> After
> {{{
> SELECT
> "reproduce_notification"."user_id"
> FROM
> "reproduce_notification"
> WHERE
> "reproduce_notification"."user_id" IN (
> SELECT
> DISTINCT U0."id"
> FROM
> "reproduce_siteuser" U0
> WHERE
> NOT (
> U0."id" IN (
> SELECT
> DISTINCT U0."id"
> FROM
> "reproduce_siteuser" U0
> WHERE
> EXISTS(
> SELECT
> 1 AS "a"
> FROM
> "reproduce_payoutrequest" U0
> WHERE
> U0."requester_id" = (U0."id")
> LIMIT
> 1
> )
> )
> )
> )
> }}}
New description:
Hello everyone.
Combining some queries with "|" or "&" somehow affects queries involved
in the operation, leading to malformed SQL and unexpected results.
Here are details and steps to reproduce. Apologise, for maybe a bit
confusing model names, I copied them from production.
{{{
class SiteUser(models.Model):
pass
class Notification(models.Model):
user = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
class PayoutRequest(models.Model):
requester = models.ForeignKey(to=SiteUser, on_delete=models.CASCADE)
}}}
Test:
{{{
from django.test import TestCase
from django.db.models import OuterRef, Exists
from reproduce.models import Notification, SiteUser, PayoutRequest
class Reproduce(TestCase):
def test(self):
u01 = SiteUser.objects.create()
u02 = SiteUser.objects.create()
u03 = SiteUser.objects.create()
Notification.objects.create(user=u01)
PayoutRequest.objects.create(requester=u01)
Notification.objects.create(user=u02)
PayoutRequest.objects.create(requester=u03)
are_active = SiteUser.objects.all().distinct()
got_money = SiteUser.objects.filter(
Exists(PayoutRequest.objects.filter(requester=OuterRef('pk')))
).distinct()
whatever_query = SiteUser.objects.all().distinct()
# Execute queries first time
need_help = are_active.exclude(pk__in=got_money)
notified =
Notification.objects.filter(user__in=need_help).values_list('user_id',
flat=True)
query_before = str(notified.query)
self.assertEqual(len(notified), 1) # correct
whatever_query | got_money # Touch "got_money" with any other
query
# Execute same queries second time
need_help = are_active.exclude(pk__in=got_money)
notified =
Notification.objects.filter(user__in=need_help).values_list('user_id',
flat=True)
query_after = str(notified.query)
print(query_before)
print(query_after)
self.assertEqual(len(notified), 1) # expected 1, got 0
self.assertEqual(query_before, query_after) # false
}}}
As you can see, merely touching the `got_money` query with any other query
leads to modifying the results of the same queries executed after that.
This test case probably may be simplified even further, but unfortunately,
I have no more time resources to dig much deeper.
I had another queries built using simple .filter() and .exclude(). Those
were not affected by combining.
I found only this query `got_money` using `Exists()` and `OuterRef()` to
be affected. There might be more of which I am not aware of.
The reason for this I don't know, but `query_before` and `query_after`
differs.
`query_before` correctly separates subqueries using W0, U0, V0 aliases,
while the `query_after` uses a single U0 alias for all subqueries, leading
to incorrect results.
Before
{{{
SELECT
"reproduce_notification"."user_id"
FROM
"reproduce_notification"
WHERE
"reproduce_notification"."user_id" IN (
SELECT
DISTINCT W0."id"
FROM
"reproduce_siteuser" W0
WHERE
NOT (
W0."id" IN (
SELECT
DISTINCT V0."id"
FROM
"reproduce_siteuser" V0
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"reproduce_payoutrequest" U0
WHERE
U0."requester_id" = (V0."id")
LIMIT
1
)
)
)
)
}}}
After
{{{
SELECT
"reproduce_notification"."user_id"
FROM
"reproduce_notification"
WHERE
"reproduce_notification"."user_id" IN (
SELECT
DISTINCT U0."id"
FROM
"reproduce_siteuser" U0
WHERE
NOT (
U0."id" IN (
SELECT
DISTINCT U0."id"
FROM
"reproduce_siteuser" U0
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"reproduce_payoutrequest" U0
WHERE
U0."requester_id" = (U0."id")
LIMIT
1
)
)
)
)
}}}
Found bug in version 4.2.7, but reproduced it in 5.0.1 the same way.
Feel free to request any additional information you might need for this.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/35099#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107018cefba33d8-6199a062-ab84-4809-836f-4308518ddf70-000000%40eu-central-1.amazonses.com.