Re: Strange query when using annotate and count

Cristiano Coelho Wed, 06 Dec 2017 11:21:06 -0800

Update. Found a work around that gets rid of the unnecessary group by (and 
hence speeds up the query, from ~200ms to ~100ms in my use case). Simply 
adding a .values('pk').
Bad thing, every model needs a custom manager and it will still use an 
inner query, still trying to figure out the side effects of it, nothing so 
far...


class FasterCountQuerySet(QuerySet):
>     def count(self):
>         return super(FasterCountQuerySet, self.values('pk')).count()
>


Makes me wonder if the original count code can be made "smarter" in order 
to prevent the unnecessary group by, or if this simply a bug


El miércoles, 6 de diciembre de 2017, 10:28:22 (UTC-3), Cristiano Coelho 
escribió:
>
> After testing for a while, there really doesn't seem to be a good way to 
> do this. .
>
> Annotation is required since the query is filtered based on the annotated 
> value, so any attempt to clear annotations would fail. Although a lookup 
> could be used for the filtering.
> But a custom lookup is not possible since the query also sorts by the 
> annotated value, so an annotation is required.
>
> This means that I would need to use a custom lookup for count queries, and 
> annotation for result queries, which would mean a lot of duplicated code 
> and also that the Django Paginator class can't be used since it expects 
> only one queryset.
>
>
> Are there really no options to make a count with a custom function to skip 
> the group by clause (and so the inner query)? I mean there's no group by at 
> all with the original query, it's just when count is used.
>
> To summarize, the original query is a bit more complex than the one 
> described on top, reason annotate is needed, and it looks more like:
>
> q = 
> Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary')).filter(score__gt=0.6).order_by('-score')
>
> Which is then sent to a paginator, that ends up almost adding 3 times more 
> database query time do to the horrid count query generated.
>
>
> El viernes, 24 de noviembre de 2017, 18:04:21 (UTC-3), Cristiano Coelho 
> escribió:
>>
>> Hello Simon,
>>
>> You are right, the score is really not meant to be attached so a custom 
>> lookup might work, if it wasn't for the issue that an order_by clause would 
>> fail without the annotation. 
>>
>> So it seems it's either update and duplicate a lot of code or find a very 
>> ugly work around, I was hoping I missed some flag or implementation detail 
>> on the custom function to tell the query builder to avoid all the grouping 
>> and subquery stuff, will research more...
>>
>>
>> El viernes, 24 de noviembre de 2017, 14:39:27 (UTC-3), Simon Charette 
>> escribió:
>>>
>>> Hello Cristiano,
>>>
>>> If you are solely using this annotation for querying purpose, that means 
>>> you are
>>> not expecting a `score` attribute to be attached to your `Vulnerability` 
>>> instances,
>>> you could try defining your function as a lookup on CharField/TextField 
>>> instead.
>>>
>>> You could then use it only when required by passing a tuple of values to 
>>> it.
>>>
>>> e.g.
>>>
>>> Vulnerability.objects.filter(summary__wordsimilarity_gt=('test', 
>>> threshold))
>>>
>>> If you need the value annotated I'm afraid the best solution would be to 
>>> use
>>> a custom paginator class. You could give a try at detecting whether or 
>>> not
>>> an annotation is referenced before removing it from query.annotations and
>>> report your finding on the aforementioned ticket but I expect this to be
>>> relatively hard to do correctly. Still it would make implementing 
>>> `alias()`
>>> way easier and provide help for users in the situation as you.
>>>
>>> Best,
>>> Simon
>>>
>>> Le vendredi 24 novembre 2017 08:14:10 UTC-5, Cristiano Coelho a écrit :
>>>>
>>>> Hello Simon,
>>>>
>>>> That private API is good to know, but now that I think of it would 
>>>> still not work for me, since my queryset is passed to a paginator and 
>>>> that's the class that does both the count and actual queryset execution, 
>>>> so 
>>>> need a queryset that can have both the annotation but also clears it if 
>>>> count is called so it doesn't create the redundant sub queries.
>>>>
>>>> I'm wondering what's better, should I try to resolve this at the 
>>>> manager level overriding count? I feel like this might cause issues if the 
>>>> annotation is actually legit (ie with an aggregate) and it needs the 
>>>> subquery after all.
>>>> The other option is to subclass the paginator class with a special one 
>>>> that does this annotation clearing before running count.
>>>>
>>>> Even with these cases, if the annotated value is used later with a 
>>>> filter query I can't really simply removed, but the sub queries and extra 
>>>> function calls really doesn't make sense to be there when doing a count, 
>>>> so 
>>>> it seems that all the options are quite bad and hackish.
>>>>
>>>> Any other options?
>>>>
>>>>
>>>> El viernes, 24 de noviembre de 2017, 1:12:07 (UTC-3), Simon Charette 
>>>> escribió:
>>>>>
>>>>> Hello Cristiano,
>>>>>
>>>>> > Isn't there a way (even if it's hackish) to simply clear the 
>>>>> annotations (and order by)? I know querysets are smart enough to not 
>>>>> include the order by clause if there's a count.
>>>>>
>>>>> The ordering is cleared on `count()` because it shouldn't interfere 
>>>>> with the result in any way. You can clear annotations using 
>>>>> `queryset.query.annotations.clear()` but be aware that it is a private 
>>>>> API 
>>>>> that could change under your feet. Make sure to make a copy of the 
>>>>> queryset 
>>>>> (e.g. copy = queryset.all()) before performing this change as it will 
>>>>> alter 
>>>>> the queryset in place.
>>>>>
>>>>> Best,
>>>>> Simon
>>>>>  
>>>>> Le jeudi 23 novembre 2017 22:41:41 UTC-5, Cristiano Coelho a écrit :
>>>>>>
>>>>>> Hello Simon, thanks for the response.
>>>>>>
>>>>>> The above code is just an example, the reason behind the annotate 
>>>>>> because there's some complicated code that builds a queryset and 
>>>>>> annotates 
>>>>>> it so it can easily be reused. It works fine 99% of the time except when 
>>>>>> there's a count involved and it ends up being redundant. The solution 
>>>>>> would 
>>>>>> be to not annotate anything and replace the code in multiple places to 
>>>>>> add 
>>>>>> the annotate call (or similar using a custom queryset or manager I 
>>>>>> guess), 
>>>>>> but that's quite painful and will end up with a lot of duplicated code 
>>>>>> since there's also an order_by that follows the annotate that needs to 
>>>>>> be 
>>>>>> moved over as well
>>>>>>
>>>>>> Isn't there a way (even if it's hackish) to simply clear the 
>>>>>> annotations (and order by)? I know querysets are smart enough to not 
>>>>>> include the order by clause if there's a count.
>>>>>>
>>>>>> You could also suggest using two separate calls or a flag to pass 
>>>>>> down to the internal code so it doesn't include the additional stuff, 
>>>>>> but 
>>>>>> that wouldn't work since paginators accept only one query set for 
>>>>>> example 
>>>>>> and internall uses it for both count and results.
>>>>>>
>>>>>>
>>>>>>
>>>>>> El viernes, 24 de noviembre de 2017, 0:05:29 (UTC-3), Simon Charette 
>>>>>> escribió:
>>>>>>>
>>>>>>> Hello Cristiano,
>>>>>>>
>>>>>>> I understand your frustration but please avoid using the developer 
>>>>>>> mailing list
>>>>>>> as a second tier support channel. I suggest you try the IRC #django 
>>>>>>> channel if
>>>>>>> you need to want to get faster support.
>>>>>>>
>>>>>>> What's happening here is that annotate() really means "select this 
>>>>>>> field" while
>>>>>>> in your other case you use a lookup (summary__icontains) which are 
>>>>>>> only going to
>>>>>>> be added to the WHERE clause of your query.
>>>>>>>
>>>>>>> I'm not sure why you are annotating your queryset without referring 
>>>>>>> to it in
>>>>>>> a filter clause later on but the ORM cannot simply ignore it when 
>>>>>>> you are
>>>>>>> performing your `count()` because some annotations could interfere 
>>>>>>> with grouping
>>>>>>> somehow.
>>>>>>>
>>>>>>> There is an open ticket[0] to add support for an `alias()` method 
>>>>>>> that would
>>>>>>> allow the ORM to clear/ignore the specified expressions if it's not 
>>>>>>> referenced
>>>>>>> in the query.
>>>>>>>
>>>>>>> In the mean time I think the best approach would be to avoid 
>>>>>>> annotating the
>>>>>>> queryset if your don't need to reference the score.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Simon
>>>>>>>
>>>>>>> [0] https://code.djangoproject.com/ticket/27719
>>>>>>>
>>>>>>> Le mardi 21 novembre 2017 08:46:21 UTC-5, Cristiano Coelho a écrit :
>>>>>>>>
>>>>>>>> Hmm, should I try with the dev mailing list? Guess it's something 
>>>>>>>> no one faced before?
>>>>>>>>
>>>>>>>> El martes, 14 de noviembre de 2017, 22:54:23 (UTC-3), Cristiano 
>>>>>>>> Coelho escribió:
>>>>>>>>>
>>>>>>>>> I'm getting some very odd query when combining annotate with 
>>>>>>>>> count. See the following:
>>>>>>>>>
>>>>>>>>> >>> q = 
>>>>>>>>>> Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary'))
>>>>>>>>>> >>> q.count()
>>>>>>>>>> 3094
>>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>>> 'SELECT COUNT(*) 
>>>>>>>>>> FROM (
>>>>>>>>>>     SELECT "vulnerabilities_vulnerability"."id" AS Col1, 
>>>>>>>>>> custom_word_similarity(\'test\', 
>>>>>>>>>> "vulnerabilities_vulnerability"."summary") 
>>>>>>>>>> AS "score" 
>>>>>>>>>>     FROM "vulnerabilities_vulnerability" 
>>>>>>>>>>     GROUP BY "vulnerabilities_vulnerability"."id", 
>>>>>>>>>> custom_word_similarity(\'test\', 
>>>>>>>>>> "vulnerabilities_vulnerability"."summary")
>>>>>>>>>> ) subquery
>>>>>>>>>> >>> q2 = Vulnerability.objects.filter(summary__icontains='test')
>>>>>>>>>> >>> q2.count()
>>>>>>>>>> 33
>>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>>> 'SELECT COUNT(*) AS "__count" 
>>>>>>>>>> FROM "vulnerabilities_vulnerability" 
>>>>>>>>>> WHERE UPPER("vulnerabilities_vulnerability"."summary"::text) LIKE 
>>>>>>>>>> UPPER(\'%test%\')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Custom function code, is this what's causing the odd count 
>>>>>>>>> behavior? Did I miss anything?
>>>>>>>>>
>>>>>>>>> class WordTrigramCustomSimilarity(Func):
>>>>>>>>>>     function = 'custom_word_similarity' 
>>>>>>>>>>     def __init__(self, string, expression, **extra):
>>>>>>>>>>         if not hasattr(string, 'resolve_expression'):
>>>>>>>>>>             string = Value(string)
>>>>>>>>>>         super(WordTrigramCustomSimilarity, self).__init__(string, 
>>>>>>>>>> expression, output_field=FloatField(), **extra)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would expect for the query to be a simple count, rather than a 
>>>>>>>>> nested query with a useless group by (correct me if I'm wrong).
>>>>>>>>> The issue gets even worse if the function is expensive, since it 
>>>>>>>>> gets called when it's not needed at all, more than once.
>>>>>>>>> Also the issue behaves pretty much the same if the queryset 
>>>>>>>>> includes filtering and ordering but I didn't include it here for 
>>>>>>>>> simplicity.
>>>>>>>>>
>>>>>>>>> Using Django 1.11.7 + postgres (psycopg) backend.
>>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/24320bd9-e6df-4e6b-948f-39f302073f1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Strange query when using annotate and count

Reply via email to