#34393: A filter query returns more items than the original queryset provides 
after
applying INNER JOIN
-------------------------------------+-------------------------------------
               Reporter:  Ľuboš      |          Owner:  nobody
  Mjachky                            |
                   Type:             |         Status:  new
  Uncategorized                      |
              Component:  Database   |        Version:  3.2
  layer (models, ORM)                |       Keywords:  filter query
               Severity:  Normal     |  duplicate distinct
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 In our project, we identified that the filter query returns more entries
 than the number of entries stored in the initial queryset.

 The following piece of code is involved:
 {{{
 # qs.count() == 4
 scoped_repos = repo_viewset.get_queryset().values_list("pk", flat=True)
 filtered_content = qs.filter(repositories__in=scoped_repos)
 # filtered_content.count() == 8
 }}}

 The generated query:
 {{{
 SELECT * FROM "rpm_package" INNER JOIN "core_content" ON
 ("rpm_package"."content_ptr_id" = "core_content"."pulp_id")
 INNER JOIN "core_repositorycontent" ON ("core_content"."pulp_id" =
 "core_repositorycontent"."content_id")
 WHERE "core_repositorycontent"."repository_id" IN (c35b7039-2c2c-48e3
 -8f4f-b0eeabad8af1, ee39a78b-9dd5-4bdf-85d9-eb6406b6ef49)
 }}}

 One of the things being noticed is that the query is constructed with an
 INNER JOIN clause instead of a LEFT JOIN clause. The
 core_repositorycontent table contains a lot of duplicates. We believe that
 this should not be a problem. Adding the distinct() query at the end of
 the call resolves the issue. See
 https://github.com/pulp/pulpcore/pull/3642.

 The question is whether this is a bug in Django (i.e., a filter query can
 return more elements than there are in the original queryset) or on our
 side, and we should restructure the query in a specific way. Any advice is
 welcome.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34393>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070186c2cec158-38982f63-7580-4ee8-99ee-fc2ebf8e9136-000000%40eu-central-1.amazonses.com.

Reply via email to