#28128: Fulltext search very slow with annotate(SearchVector)
--------------------------------------------+------------------------
               Reporter:  Gavin Wahl        |          Owner:  (none)
                   Type:  Uncategorized     |         Status:  new
              Component:  contrib.postgres  |        Version:  1.11
               Severity:  Normal            |       Keywords:
           Triage Stage:  Unreviewed        |      Has patch:  0
    Needs documentation:  0                 |    Needs tests:  0
Patch needs improvement:  0                 |  Easy pickings:  0
                  UI/UX:  0                 |
--------------------------------------------+------------------------
 I have this model:

 {{{#!python
 class Bug(models.Model):
     # .. fields ...

     # This field is managed by Django, adding stuff to it that should be
     # included in the fulltext search.
     fulltext = models.TextField()
 }}}

 The `Bug` table has many rows, so I need to add an index on the fulltext
 column for fulltext search:

 {{{#!sql
 CREATE INDEX ON buggy_bug USING GIN (to_tsvector('english'::regconfig,
 COALESCE("fulltext", '')));
 }}}

 Note that postgres requires the `english` config argument to `to_tsvector`
 for it be allowed to be used in an index.

 Now, at query time, the expression Django uses must exactly match the
 expression in the functional index. By default,
 `filter(fulltext__search='foo')` leaves out the config argument to
 `to_tsvector`, so my index isn't used. The documentation shows us how to
 fix this, by annotating a search field:

 {{{#!sql
 Bug.objects.annotate(
     search=SearchVector('fulltext', config='english')
 ).filter(
     search='foo'
 )
 }}}

 Now the expressions match and the index is used. However, the query is
 still crazy slow. This is the query Django does:

 {{{#!sql
 SELECT "buggy_bug"."id",
        to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
 '')) AS "search"
 FROM "buggy_bug"
 WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
 '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
 }}}

 Which takes 2500 milliseconds on my data, and EXPLAIN confirms it is using
 the index.  With some playing around with it, I find that removing the
 annotated search column gives a huge speedup:

 {{{#!sql
 SELECT "buggy_bug"."id"
 FROM "buggy_bug"
 WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
 '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
 }}}

 This query only takes 11 milliseconds.

 What's the point of selecting the result of to_tsvector? We don't actually
 use it in Django, we just had to do it to specify the config to use for
 to_tsvector.

 There needs to be a way to specify the config for to_tsvector in order to
 use an index without a huge slowdown.

--
Ticket URL: <https://code.djangoproject.com/ticket/28128>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.3865a711adb697d06d2bdab60d213039%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to