This is a typical *greatest-per-group* problem, there is even a tag 
<https://stackoverflow.com/questions/tagged/greatest-n-per-group> for that 
in SO:
And if you search for questions in this area related to Django 
<https://stackoverflow.com/questions/tagged/greatest-n-per-group+django> 
you will probably get: here 
<https://stackoverflow.com/questions/2111590/how-can-a-do-a-greatest-n-per-group-query-in-django>
 
or here 
<https://stackoverflow.com/questions/2074514/django-query-that-get-most-recent-objects-from-different-categories>
.
My approach to this problem, is to use two queries via prefetch_related.
In your case it would be something like this:

countries = Country.objects.prefetch_related(
    Prefetch('author_set',
        queryset=(Author.objects
                    .filter(book_set__pubdate__lte=SOMEDATE)
                    .annotate(last_publish=Max('book_set__pubdate'))
                    .order_by('-last_publish')
        ),
        to_attr='authors'
    )
)

#usage
for country in countries:
    #get the first author in authors or None if empty
    top_author = next(iter(country.authors), None)


the problem with this approach is that the memory footprint could get very 
big if you have lots of objects per group. In my cases I was always able to 
filter them down to several objects only, so it was not an issue. 

However, thanks to the new staff that come with Django 1.11, now its 
actually possible to build a real greatest-per-group query with the ORM.
This can be done using the new Subquery 
<https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions>
 
and OuterRef 
<https://docs.djangoproject.com/en/1.11/ref/models/expressions/#django.db.models.OuterRef>
 
classess. In your case it must be something like this:

authors = Author.objects.filter(
    book__pubdate=Subquery(
        (Author.objects
            .filter(country=OuterRef('country'))
            .values('country')
            .annotate(max_date=Max('book_set__pubdate'))
            .values('max_date')[:1]
        )
    )
)

But be careful with this approach, because if you have more than one 
maximum from a group, you get all in of them in return.

PS. I've just recently just found out the 2nd approach, its not very clear 
from the docs that it possible, maybe we can/need document it? This is a 
common problem and may be useful to people.



On Saturday, May 13, 2017 at 6:35:32 PM UTC+3, Bernd Wechner wrote:
>
> I'm bamboozled by Django aggregation:
>
> https://docs.djangoproject.com/en/1.11/topics/db/aggregation/
>
> I'd like to add a little to the example they present to clarify my problem 
> (keeping it in familiar territory with regards to the documentation):
>
> from django.db import models
> class Author(models.Model):
>     name = models.CharField(max_length=100)
>     age = models.IntegerField()    country = models.ForeignKey('Country') 
> class Country(models.Model):
>     name = models.CharField(max_length=100)
> class Publisher(models.Model):
>     name = models.CharField(max_length=300)
>     num_awards = models.IntegerField()
> class Book(models.Model):
>     name = models.CharField(max_length=300)
>     pages = models.IntegerField()
>     price = models.DecimalField(max_digits=10, decimal_places=2)
>     rating = models.FloatField()
>     authors = models.ManyToManyField(Author)
>     publisher = models.ForeignKey(Publisher)
>     pubdate = models.DateField()
> class Store(models.Model):
>     name = models.CharField(max_length=300)
>     books = models.ManyToManyField(Book)
>     registered_users = models.PositiveIntegerField()
>
>
> I'd like to list the name and age of the (one) author in each country that 
> published the last book before a given date. 
>
> Ostensibly the only option I have is:
>
>     LatestAuthors = 
> Author.objects.filter(book_set__pubdate__lte=SOMEDATE).values('country').annotate(Max('book_set__pubdate'))
>
> That seems to work generating a query that authors are selected Grouped By 
> country and selecting the Maximum book pubdate ... but alas the queryset is 
> not a set of author objects it is a set of dictionaries that only contain 
> "country", I have no access to the authors name or age!
>
> It seems the values() method is overloaded, and used to both select the 
> values that are in the SELECT and the values that are in the GROUP BY 
> clause of the generated query.
>
> I would like select all the fields of Author but group by only Country.
>
> So that I could for example list the name and age of the most recently 
> published author in each country.
>
> I've read the docs at length and the evidence seems to suggest such an 
> overloading alas, and that I may need to write SQL!
>
> But I would like to think my ignorance is great enough that a better 
> solution exists, to select * on Author while group by Author.country.
>
> If I leave the values clause out, I get multiple Authors listed per 
> country (it is not grouped by country), but I get a set of authors back. If 
> I add a second values() clause:
>
>     LatestAuthors = 
> Author.objects.filter(book_set__pubdate__lte=SOMEDATE).values('country').annotate(Max('book_set__pubdate')).values()
>
> Things get bizarre ... as in I see all the Author fields in the returned 
> dictionaries but the group by reverts to id in Author not country. Grrr.
>
> Regards,
>
> Bernd.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/fe75b108-081a-416d-bef4-a824cee1753b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to