Hello,

I would like to open a discussion on the change I have proposed with pull 
request #7277 <https://github.com/django/django/pull/7277>.

To bring the discussion here, the problem with current implementation of 
get_search_results() is that, when search fields include fields through 
reverse relationships, it produces queries that the more words are used in 
the search term the more inefficient they are. This inefficiency comes from 
duplicate left joins with same tables which are in that case introduced by 
Django ORM. There have been relevant reports by others before, which you 
can see in tickets #16603 <https://code.djangoproject.com/ticket/16603> and 
#25789 <https://code.djangoproject.com/ticket/25789>. However, judging by 
the fact that for five years there has not been a solution for this, it 
seems that it is not an easy fix.

That said, this inefficiency can easily get the system down, as users in 
the admin can use a few words in the search term, either deliberately or by 
mistake, e.g. by accidentally copying & pasting a whole sentence or 
paragraph. In my case, with just 3 words in the search term and 4 tables 
involved in the search (with their size being in the range of 1500 to 4000 
rows), the query had been running for 25 minutes before I actually killed 
it on the database server. Evenmore, the impact of that inefficiency can be 
easily multiplied as users are not that patient and tend to repeat the 
search again and again when they see that they are not getting results fast 
enough.

So, I wrote a small patch which changes the way the query is built and, 
happily, the problem is gone. However, as Simon Charette has pointed out in 
the pull request's discussion, there is the corner case scenario in which 
some results may not be returned.
I am quoting his exact words:

Keep in mind that filter(or_queries[0] & or_queries[1] & ... & 
> or_queries[n]) can generate different results from 
> filter(or_queries[0]).filter(or_queries[1]).filter(...).filter(or_queries[n]) 
> if any of filters spans multi-valued relationships.
> For example, given models Book and Author and a AuthorAmin searching both 
> books__title and books__description your solution could filter out results 
> that use to be displayed as search terms will have to be present in both 
> Book.title and Book.description to match.


So, the question is whether that behavior is absolutely necessary for 
searches in the admin. I personally believe that the compromise is small 
enough and the benefit is really great. We are eliminating a possibility 
for our normal users to cause a denial of service through a simple search 
in the admin, with just a little compromise on the powerfulness of the 
search function. What are your thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c15af264-4839-4292-8a4c-53aea76026cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to