After testing for a while, there really doesn't seem to be a good way to do 
this. .

Annotation is required since the query is filtered based on the annotated 
value, so any attempt to clear annotations would fail. Although a lookup 
could be used for the filtering.
But a custom lookup is not possible since the query also sorts by the 
annotated value, so an annotation is required.

This means that I would need to use a custom lookup for count queries, and 
annotation for result queries, which would mean a lot of duplicated code 
and also that the Django Paginator class can't be used since it expects 
only one queryset.


Are there really no options to make a count with a custom function to skip 
the group by clause (and so the inner query)? I mean there's no group by at 
all with the original query, it's just when count is used.

To summarize, the original query is a bit more complex than the one 
described on top, reason annotate is needed, and it looks more like:

q = 
Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary')).filter(score__gt=0.6).order_by('-score')

Which is then sent to a paginator, that ends up almost adding 3 times more 
database query time do to the horrid count query generated.


El viernes, 24 de noviembre de 2017, 18:04:21 (UTC-3), Cristiano Coelho 
escribió:
>
> Hello Simon,
>
> You are right, the score is really not meant to be attached so a custom 
> lookup might work, if it wasn't for the issue that an order_by clause would 
> fail without the annotation. 
>
> So it seems it's either update and duplicate a lot of code or find a very 
> ugly work around, I was hoping I missed some flag or implementation detail 
> on the custom function to tell the query builder to avoid all the grouping 
> and subquery stuff, will research more...
>
>
> El viernes, 24 de noviembre de 2017, 14:39:27 (UTC-3), Simon Charette 
> escribió:
>>
>> Hello Cristiano,
>>
>> If you are solely using this annotation for querying purpose, that means 
>> you are
>> not expecting a `score` attribute to be attached to your `Vulnerability` 
>> instances,
>> you could try defining your function as a lookup on CharField/TextField 
>> instead.
>>
>> You could then use it only when required by passing a tuple of values to 
>> it.
>>
>> e.g.
>>
>> Vulnerability.objects.filter(summary__wordsimilarity_gt=('test', 
>> threshold))
>>
>> If you need the value annotated I'm afraid the best solution would be to 
>> use
>> a custom paginator class. You could give a try at detecting whether or not
>> an annotation is referenced before removing it from query.annotations and
>> report your finding on the aforementioned ticket but I expect this to be
>> relatively hard to do correctly. Still it would make implementing 
>> `alias()`
>> way easier and provide help for users in the situation as you.
>>
>> Best,
>> Simon
>>
>> Le vendredi 24 novembre 2017 08:14:10 UTC-5, Cristiano Coelho a écrit :
>>>
>>> Hello Simon,
>>>
>>> That private API is good to know, but now that I think of it would still 
>>> not work for me, since my queryset is passed to a paginator and that's the 
>>> class that does both the count and actual queryset execution, so need a 
>>> queryset that can have both the annotation but also clears it if count is 
>>> called so it doesn't create the redundant sub queries.
>>>
>>> I'm wondering what's better, should I try to resolve this at the manager 
>>> level overriding count? I feel like this might cause issues if the 
>>> annotation is actually legit (ie with an aggregate) and it needs the 
>>> subquery after all.
>>> The other option is to subclass the paginator class with a special one 
>>> that does this annotation clearing before running count.
>>>
>>> Even with these cases, if the annotated value is used later with a 
>>> filter query I can't really simply removed, but the sub queries and extra 
>>> function calls really doesn't make sense to be there when doing a count, so 
>>> it seems that all the options are quite bad and hackish.
>>>
>>> Any other options?
>>>
>>>
>>> El viernes, 24 de noviembre de 2017, 1:12:07 (UTC-3), Simon Charette 
>>> escribió:
>>>>
>>>> Hello Cristiano,
>>>>
>>>> > Isn't there a way (even if it's hackish) to simply clear the 
>>>> annotations (and order by)? I know querysets are smart enough to not 
>>>> include the order by clause if there's a count.
>>>>
>>>> The ordering is cleared on `count()` because it shouldn't interfere 
>>>> with the result in any way. You can clear annotations using 
>>>> `queryset.query.annotations.clear()` but be aware that it is a private API 
>>>> that could change under your feet. Make sure to make a copy of the 
>>>> queryset 
>>>> (e.g. copy = queryset.all()) before performing this change as it will 
>>>> alter 
>>>> the queryset in place.
>>>>
>>>> Best,
>>>> Simon
>>>>  
>>>> Le jeudi 23 novembre 2017 22:41:41 UTC-5, Cristiano Coelho a écrit :
>>>>>
>>>>> Hello Simon, thanks for the response.
>>>>>
>>>>> The above code is just an example, the reason behind the annotate 
>>>>> because there's some complicated code that builds a queryset and 
>>>>> annotates 
>>>>> it so it can easily be reused. It works fine 99% of the time except when 
>>>>> there's a count involved and it ends up being redundant. The solution 
>>>>> would 
>>>>> be to not annotate anything and replace the code in multiple places to 
>>>>> add 
>>>>> the annotate call (or similar using a custom queryset or manager I 
>>>>> guess), 
>>>>> but that's quite painful and will end up with a lot of duplicated code 
>>>>> since there's also an order_by that follows the annotate that needs to be 
>>>>> moved over as well
>>>>>
>>>>> Isn't there a way (even if it's hackish) to simply clear the 
>>>>> annotations (and order by)? I know querysets are smart enough to not 
>>>>> include the order by clause if there's a count.
>>>>>
>>>>> You could also suggest using two separate calls or a flag to pass down 
>>>>> to the internal code so it doesn't include the additional stuff, but that 
>>>>> wouldn't work since paginators accept only one query set for example and 
>>>>> internall uses it for both count and results.
>>>>>
>>>>>
>>>>>
>>>>> El viernes, 24 de noviembre de 2017, 0:05:29 (UTC-3), Simon Charette 
>>>>> escribió:
>>>>>>
>>>>>> Hello Cristiano,
>>>>>>
>>>>>> I understand your frustration but please avoid using the developer 
>>>>>> mailing list
>>>>>> as a second tier support channel. I suggest you try the IRC #django 
>>>>>> channel if
>>>>>> you need to want to get faster support.
>>>>>>
>>>>>> What's happening here is that annotate() really means "select this 
>>>>>> field" while
>>>>>> in your other case you use a lookup (summary__icontains) which are 
>>>>>> only going to
>>>>>> be added to the WHERE clause of your query.
>>>>>>
>>>>>> I'm not sure why you are annotating your queryset without referring 
>>>>>> to it in
>>>>>> a filter clause later on but the ORM cannot simply ignore it when you 
>>>>>> are
>>>>>> performing your `count()` because some annotations could interfere 
>>>>>> with grouping
>>>>>> somehow.
>>>>>>
>>>>>> There is an open ticket[0] to add support for an `alias()` method 
>>>>>> that would
>>>>>> allow the ORM to clear/ignore the specified expressions if it's not 
>>>>>> referenced
>>>>>> in the query.
>>>>>>
>>>>>> In the mean time I think the best approach would be to avoid 
>>>>>> annotating the
>>>>>> queryset if your don't need to reference the score.
>>>>>>
>>>>>> Cheers,
>>>>>> Simon
>>>>>>
>>>>>> [0] https://code.djangoproject.com/ticket/27719
>>>>>>
>>>>>> Le mardi 21 novembre 2017 08:46:21 UTC-5, Cristiano Coelho a écrit :
>>>>>>>
>>>>>>> Hmm, should I try with the dev mailing list? Guess it's something no 
>>>>>>> one faced before?
>>>>>>>
>>>>>>> El martes, 14 de noviembre de 2017, 22:54:23 (UTC-3), Cristiano 
>>>>>>> Coelho escribió:
>>>>>>>>
>>>>>>>> I'm getting some very odd query when combining annotate with count. 
>>>>>>>> See the following:
>>>>>>>>
>>>>>>>> >>> q = 
>>>>>>>>> Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary'))
>>>>>>>>> >>> q.count()
>>>>>>>>> 3094
>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>> 'SELECT COUNT(*) 
>>>>>>>>> FROM (
>>>>>>>>>     SELECT "vulnerabilities_vulnerability"."id" AS Col1, 
>>>>>>>>> custom_word_similarity(\'test\', 
>>>>>>>>> "vulnerabilities_vulnerability"."summary") 
>>>>>>>>> AS "score" 
>>>>>>>>>     FROM "vulnerabilities_vulnerability" 
>>>>>>>>>     GROUP BY "vulnerabilities_vulnerability"."id", 
>>>>>>>>> custom_word_similarity(\'test\', 
>>>>>>>>> "vulnerabilities_vulnerability"."summary")
>>>>>>>>> ) subquery
>>>>>>>>> >>> q2 = Vulnerability.objects.filter(summary__icontains='test')
>>>>>>>>> >>> q2.count()
>>>>>>>>> 33
>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>> 'SELECT COUNT(*) AS "__count" 
>>>>>>>>> FROM "vulnerabilities_vulnerability" 
>>>>>>>>> WHERE UPPER("vulnerabilities_vulnerability"."summary"::text) LIKE 
>>>>>>>>> UPPER(\'%test%\')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Custom function code, is this what's causing the odd count 
>>>>>>>> behavior? Did I miss anything?
>>>>>>>>
>>>>>>>> class WordTrigramCustomSimilarity(Func):
>>>>>>>>>     function = 'custom_word_similarity' 
>>>>>>>>>     def __init__(self, string, expression, **extra):
>>>>>>>>>         if not hasattr(string, 'resolve_expression'):
>>>>>>>>>             string = Value(string)
>>>>>>>>>         super(WordTrigramCustomSimilarity, self).__init__(string, 
>>>>>>>>> expression, output_field=FloatField(), **extra)
>>>>>>>>
>>>>>>>>
>>>>>>>> I would expect for the query to be a simple count, rather than a 
>>>>>>>> nested query with a useless group by (correct me if I'm wrong).
>>>>>>>> The issue gets even worse if the function is expensive, since it 
>>>>>>>> gets called when it's not needed at all, more than once.
>>>>>>>> Also the issue behaves pretty much the same if the queryset 
>>>>>>>> includes filtering and ordering but I didn't include it here for 
>>>>>>>> simplicity.
>>>>>>>>
>>>>>>>> Using Django 1.11.7 + postgres (psycopg) backend.
>>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/cbdc0b45-1a55-497a-8263-a32a03a201bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to