#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-------------------------------------+-------------------------------------
               Reporter:  Robert     |          Owner:  nobody
  Leach                              |
                   Type:  Bug        |         Status:  new
              Component:  Database   |        Version:  3.2
  layer (models, ORM)                |
               Severity:  Normal     |       Keywords:  sql, distinct,
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I have a rather complex database and an advanced search interface that
 creates complex queries.  It’s been working great for over a year now.

 I recently added a feature to count distinct related table records in the
 joined results. When I added fields to these many-related tables to
 `.distinct()` (and to `.order_by()`), I couldn’t get the test to execute
 without hitting an `InvalidColumnReference` error. And though I’m
 supplying the same expanded fields list to `.order_by()` that I am to
 `.distinct()`, the error claims that `SELECT DISTINCT ON expressions must
 match initial ORDER BY expressions`…

 When I print the SQL, the place where it notes a difference has a weird
 `T8` reference where the model name should be in an `order by` clause.
 The corresponding `distinct` clause has the full table name instead of the
 reference, which is what I suspect is triggering the exception.

 I was able to create a set of toy models and a test that minimally
 reproduces the exception...

 toymodels.py:
 {{{
 from django.db.models import Model, CharField, AutoField, ForeignKey,
 ManyToManyField, CASCADE

 class TestPeak(Model):
     id = AutoField(primary_key=True)
     name = CharField(max_length=10)
     compounds = ManyToManyField(
         to="TestCompound",
         related_name="testpeaks",
     )
     class Meta:
         verbose_name = "testpeak"
         verbose_name_plural = "testpeaks"
         ordering = ["name"]

 class TestCompound(Model):
     id = AutoField(primary_key=True)
     name = CharField(max_length=10)
     class Meta:
         verbose_name = "testcompound"
         verbose_name_plural = "testcompounds"
         ordering = ["name"]

 class TestSynonym(Model):
     name = CharField(max_length=10, primary_key=True)
     compound = ForeignKey(
         TestCompound, related_name="testsynonyms", on_delete=CASCADE
     )
     class Meta:
         verbose_name = "testsynonym"
         verbose_name_plural = "testsynonyms"
         ordering = ["compound", "name"]
 }}}

 test_bug.py:
 {{{
 from DataRepo.tests.tracebase_test_case import TracebaseTestCase
 from DataRepo.models.toymodels import TestPeak, TestCompound, TestSynonym
 from django.db.models import Q

 class DjangoSQLBug(TracebaseTestCase):
     maxDiff = None

     @classmethod
     def setUpTestData(cls):
         TestCompound.objects.create(name="testcpd")
         cpd = TestCompound.objects.get(id__exact=1)
         TestSynonym.objects.create(name="testsyn",compound=cpd)
         TestPeak.objects.create(name="testpk")
         pk = TestPeak.objects.get(id__exact=1)
         pk.compounds.add(cpd)

     def test_mm_om_query(self):
         q_exp = Q(name__iexact="testpk")
         distinct_fields = ['name', 'pk',
 'compounds__testsynonyms__compound', 'compounds__testsynonyms__name',
 'compounds__testsynonyms__pk', 'compounds__name', 'compounds__pk']
         qs =
 
TestPeak.objects.filter(q_exp).order_by(*distinct_fields).distinct(*distinct_fields)
         self.assertEqual(qs.count(), 1)
 }}}

 `python manage.py test` output:
 {{{
 Creating test database for alias 'default'...
 Creating test database for alias 'validation'...
 System check identified no issues (0 silenced).
 E
 ======================================================================
 ERROR: test_mm_om_query (DataRepo.tests.sqlbugtest.test_bug.DjangoSQLBug)
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 84, in _execute
     return self.cursor.execute(sql, params)
 psycopg2.errors.InvalidColumnReference: SELECT DISTINCT ON expressions
 must match initial ORDER BY expressions
 LINE 1: ...peak"."id", "DataRepo_testsynonym"."compound_id", "DataRepo_...
                                                              ^


 The above exception was the direct cause of the following exception:

 Traceback (most recent call last):
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/DataRepo/tests/sqlbugtest/test_bug.py", line 21,
 in test_mm_om_query
     self.assertEqual(qs.count(), 1)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/models/query.py", line 412, in count
     return self.query.get_count(using=self.db)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/models/sql/query.py", line 519, in get_count
     number = obj.get_aggregation(using, ['__count'])['__count']
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/models/sql/query.py", line 504, in get_aggregation
     result = compiler.execute_sql(SINGLE)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
     cursor.execute(sql, params)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 66, in execute
     return self._execute_with_wrappers(sql, params, many=False,
 executor=self._execute)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
     return executor(sql, params, many, context)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 84, in _execute
     return self.cursor.execute(sql, params)
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/utils.py", line 90, in __exit__
     raise dj_exc_value.with_traceback(traceback) from exc_value
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 84, in _execute
     return self.cursor.execute(sql, params)
 django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must
 match initial ORDER BY expressions
 LINE 1: ...peak"."id", "DataRepo_testsynonym"."compound_id", "DataRepo_...
                                                              ^


 ----------------------------------------------------------------------
 Ran 1 test in 0.018s

 FAILED (errors=1)
 Destroying test database for alias 'default'...
 Destroying test database for alias 'validation'...
 gen-rl-macbookair[2022-05-05 12:34:44]:~/PROJECT-
 local/TRACEBASE/tracebase$
 }}}

 I posted on Django Users about this, if you would like more information:

 https://forum.djangoproject.com/t/is-this-a-bug-in-djangos-sql-creation-
 through-multiple-many-to-many-tables/13508/9

 We’re on Django 3.2 using Postgres.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33682>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180956f13df-cfeba268-dfcf-4766-9d22-5e0179c60ab9-000000%40eu-central-1.amazonses.com.

Reply via email to