#28128: Fulltext search very slow with annotate(SearchVector)
--------------------------------------------+------------------------
Reporter: Gavin Wahl | Owner: (none)
Type: Uncategorized | Status: new
Component: contrib.postgres | Version: 1.11
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
--------------------------------------------+------------------------
I have this model:
{{{#!python
class Bug(models.Model):
# .. fields ...
# This field is managed by Django, adding stuff to it that should be
# included in the fulltext search.
fulltext = models.TextField()
}}}
The `Bug` table has many rows, so I need to add an index on the fulltext
column for fulltext search:
{{{#!sql
CREATE INDEX ON buggy_bug USING GIN (to_tsvector('english'::regconfig,
COALESCE("fulltext", '')));
}}}
Note that postgres requires the `english` config argument to `to_tsvector`
for it be allowed to be used in an index.
Now, at query time, the expression Django uses must exactly match the
expression in the functional index. By default,
`filter(fulltext__search='foo')` leaves out the config argument to
`to_tsvector`, so my index isn't used. The documentation shows us how to
fix this, by annotating a search field:
{{{#!sql
Bug.objects.annotate(
search=SearchVector('fulltext', config='english')
).filter(
search='foo'
)
}}}
Now the expressions match and the index is used. However, the query is
still crazy slow. This is the query Django does:
{{{#!sql
SELECT "buggy_bug"."id",
to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) AS "search"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
}}}
Which takes 2500 milliseconds on my data, and EXPLAIN confirms it is using
the index. With some playing around with it, I find that removing the
annotated search column gives a huge speedup:
{{{#!sql
SELECT "buggy_bug"."id"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext",
'')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
}}}
This query only takes 11 milliseconds.
What's the point of selecting the result of to_tsvector? We don't actually
use it in Django, we just had to do it to specify the config to use for
to_tsvector.
There needs to be a way to specify the config for to_tsvector in order to
use an index without a huge slowdown.
--
Ticket URL: <https://code.djangoproject.com/ticket/28128>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/052.3865a711adb697d06d2bdab60d213039%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.