After testing for a while, there really doesn't seem to be a good way to do
this. .
Annotation is required since the query is filtered based on the annotated
value, so any attempt to clear annotations would fail. Although a lookup
could be used for the filtering.
But a custom lookup is not possible since the query also sorts by the
annotated value, so an annotation is required.
This means that I would need to use a custom lookup for count queries, and
annotation for result queries, which would mean a lot of duplicated code
and also that the Django Paginator class can't be used since it expects
only one queryset.
Are there really no options to make a count with a custom function to skip
the group by clause (and so the inner query)? I mean there's no group by at
all with the original query, it's just when count is used.
To summarize, the original query is a bit more complex than the one
described on top, reason annotate is needed, and it looks more like:
q =
Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary')).filter(score__gt=0.6).order_by('-score')
Which is then sent to a paginator, that ends up almost adding 3 times more
database query time do to the horrid count query generated.
El viernes, 24 de noviembre de 2017, 18:04:21 (UTC-3), Cristiano Coelho
escribió:
>
> Hello Simon,
>
> You are right, the score is really not meant to be attached so a custom
> lookup might work, if it wasn't for the issue that an order_by clause would
> fail without the annotation.
>
> So it seems it's either update and duplicate a lot of code or find a very
> ugly work around, I was hoping I missed some flag or implementation detail
> on the custom function to tell the query builder to avoid all the grouping
> and subquery stuff, will research more...
>
>
> El viernes, 24 de noviembre de 2017, 14:39:27 (UTC-3), Simon Charette
> escribió:
>>
>> Hello Cristiano,
>>
>> If you are solely using this annotation for querying purpose, that means
>> you are
>> not expecting a `score` attribute to be attached to your `Vulnerability`
>> instances,
>> you could try defining your function as a lookup on CharField/TextField
>> instead.
>>
>> You could then use it only when required by passing a tuple of values to
>> it.
>>
>> e.g.
>>
>> Vulnerability.objects.filter(summary__wordsimilarity_gt=('test',
>> threshold))
>>
>> If you need the value annotated I'm afraid the best solution would be to
>> use
>> a custom paginator class. You could give a try at detecting whether or not
>> an annotation is referenced before removing it from query.annotations and
>> report your finding on the aforementioned ticket but I expect this to be
>> relatively hard to do correctly. Still it would make implementing
>> `alias()`
>> way easier and provide help for users in the situation as you.
>>
>> Best,
>> Simon
>>
>> Le vendredi 24 novembre 2017 08:14:10 UTC-5, Cristiano Coelho a écrit :
>>>
>>> Hello Simon,
>>>
>>> That private API is good to know, but now that I think of it would still
>>> not work for me, since my queryset is passed to a paginator and that's the
>>> class that does both the count and actual queryset execution, so need a
>>> queryset that can have both the annotation but also clears it if count is
>>> called so it doesn't create the redundant sub queries.
>>>
>>> I'm wondering what's better, should I try to resolve this at the manager
>>> level overriding count? I feel like this might cause issues if the
>>> annotation is actually legit (ie with an aggregate) and it needs the
>>> subquery after all.
>>> The other option is to subclass the paginator class with a special one
>>> that does this annotation clearing before running count.
>>>
>>> Even with these cases, if the annotated value is used later with a
>>> filter query I can't really simply removed, but the sub queries and extra
>>> function calls really doesn't make sense to be there when doing a count, so
>>> it seems that all the options are quite bad and hackish.
>>>
>>> Any other options?
>>>
>>>
>>> El viernes, 24 de noviembre de 2017, 1:12:07 (UTC-3), Simon Charette
>>> escribió:
>>>>
>>>> Hello Cristiano,
>>>>
>>>> > Isn't there a way (even if it's hackish) to simply clear the
>>>> annotations (and order by)? I know querysets are smart enough to not
>>>> include the order by clause if there's a count.
>>>>
>>>> The ordering is cleared on `count()` because it shouldn't interfere
>>>> with the result in any way. You can clear annotations using
>>>> `queryset.query.annotations.clear()` but be aware that it is a private API
>>>> that could change under your feet. Make sure to make a copy of the
>>>> queryset
>>>> (e.g. copy = queryset.all()) before performing this change as it will
>>>> alter
>>>> the queryset in place.
>>>>
>>>> Best,
>>>> Simon
>>>>
>>>> Le jeudi 23 novembre 2017 22:41:41 UTC-5, Cristiano Coelho a écrit :
>>>>>
>>>>> Hello Simon, thanks for the response.
>>>>>
>>>>> The above code is just an example, the reason behind the annotate
>>>>> because there's some complicated code that builds a queryset and
>>>>> annotates
>>>>> it so it can easily be reused. It works fine 99% of the time except when
>>>>> there's a count involved and it ends up being redundant. The solution
>>>>> would
>>>>> be to not annotate anything and replace the code in multiple places to
>>>>> add
>>>>> the annotate call (or similar using a custom queryset or manager I
>>>>> guess),
>>>>> but that's quite painful and will end up with a lot of duplicated code
>>>>> since there's also an order_by that follows the annotate that needs to be
>>>>> moved over as well
>>>>>
>>>>> Isn't there a way (even if it's hackish) to simply clear the
>>>>> annotations (and order by)? I know querysets are smart enough to not
>>>>> include the order by clause if there's a count.
>>>>>
>>>>> You could also suggest using two separate calls or a flag to pass down
>>>>> to the internal code so it doesn't include the additional stuff, but that
>>>>> wouldn't work since paginators accept only one query set for example and
>>>>> internall uses it for both count and results.
>>>>>
>>>>>
>>>>>
>>>>> El viernes, 24 de noviembre de 2017, 0:05:29 (UTC-3), Simon Charette
>>>>> escribió:
>>>>>>
>>>>>> Hello Cristiano,
>>>>>>
>>>>>> I understand your frustration but please avoid using the developer
>>>>>> mailing list
>>>>>> as a second tier support channel. I suggest you try the IRC #django
>>>>>> channel if
>>>>>> you need to want to get faster support.
>>>>>>
>>>>>> What's happening here is that annotate() really means "select this
>>>>>> field" while
>>>>>> in your other case you use a lookup (summary__icontains) which are
>>>>>> only going to
>>>>>> be added to the WHERE clause of your query.
>>>>>>
>>>>>> I'm not sure why you are annotating your queryset without referring
>>>>>> to it in
>>>>>> a filter clause later on but the ORM cannot simply ignore it when you
>>>>>> are
>>>>>> performing your `count()` because some annotations could interfere
>>>>>> with grouping
>>>>>> somehow.
>>>>>>
>>>>>> There is an open ticket[0] to add support for an `alias()` method
>>>>>> that would
>>>>>> allow the ORM to clear/ignore the specified expressions if it's not
>>>>>> referenced
>>>>>> in the query.
>>>>>>
>>>>>> In the mean time I think the best approach would be to avoid
>>>>>> annotating the
>>>>>> queryset if your don't need to reference the score.
>>>>>>
>>>>>> Cheers,
>>>>>> Simon
>>>>>>
>>>>>> [0] https://code.djangoproject.com/ticket/27719
>>>>>>
>>>>>> Le mardi 21 novembre 2017 08:46:21 UTC-5, Cristiano Coelho a écrit :
>>>>>>>
>>>>>>> Hmm, should I try with the dev mailing list? Guess it's something no
>>>>>>> one faced before?
>>>>>>>
>>>>>>> El martes, 14 de noviembre de 2017, 22:54:23 (UTC-3), Cristiano
>>>>>>> Coelho escribió:
>>>>>>>>
>>>>>>>> I'm getting some very odd query when combining annotate with count.
>>>>>>>> See the following:
>>>>>>>>
>>>>>>>> >>> q =
>>>>>>>>> Vulnerability.objects.annotate(score=WordTrigramCustomSimilarity('test','summary'))
>>>>>>>>> >>> q.count()
>>>>>>>>> 3094
>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>> 'SELECT COUNT(*)
>>>>>>>>> FROM (
>>>>>>>>> SELECT "vulnerabilities_vulnerability"."id" AS Col1,
>>>>>>>>> custom_word_similarity(\'test\',
>>>>>>>>> "vulnerabilities_vulnerability"."summary")
>>>>>>>>> AS "score"
>>>>>>>>> FROM "vulnerabilities_vulnerability"
>>>>>>>>> GROUP BY "vulnerabilities_vulnerability"."id",
>>>>>>>>> custom_word_similarity(\'test\',
>>>>>>>>> "vulnerabilities_vulnerability"."summary")
>>>>>>>>> ) subquery
>>>>>>>>> >>> q2 = Vulnerability.objects.filter(summary__icontains='test')
>>>>>>>>> >>> q2.count()
>>>>>>>>> 33
>>>>>>>>> >>> print connection.queries[-1]
>>>>>>>>> 'SELECT COUNT(*) AS "__count"
>>>>>>>>> FROM "vulnerabilities_vulnerability"
>>>>>>>>> WHERE UPPER("vulnerabilities_vulnerability"."summary"::text) LIKE
>>>>>>>>> UPPER(\'%test%\')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Custom function code, is this what's causing the odd count
>>>>>>>> behavior? Did I miss anything?
>>>>>>>>
>>>>>>>> class WordTrigramCustomSimilarity(Func):
>>>>>>>>> function = 'custom_word_similarity'
>>>>>>>>> def __init__(self, string, expression, **extra):
>>>>>>>>> if not hasattr(string, 'resolve_expression'):
>>>>>>>>> string = Value(string)
>>>>>>>>> super(WordTrigramCustomSimilarity, self).__init__(string,
>>>>>>>>> expression, output_field=FloatField(), **extra)
>>>>>>>>
>>>>>>>>
>>>>>>>> I would expect for the query to be a simple count, rather than a
>>>>>>>> nested query with a useless group by (correct me if I'm wrong).
>>>>>>>> The issue gets even worse if the function is expensive, since it
>>>>>>>> gets called when it's not needed at all, more than once.
>>>>>>>> Also the issue behaves pretty much the same if the queryset
>>>>>>>> includes filtering and ordering but I didn't include it here for
>>>>>>>> simplicity.
>>>>>>>>
>>>>>>>> Using Django 1.11.7 + postgres (psycopg) backend.
>>>>>>>>
>>>>>>>
--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/cbdc0b45-1a55-497a-8263-a32a03a201bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.