Managed it in the end, so for reference here is what I ended up with:
def total_credit_debt(self):
transaction_totals =
Transaction.objects.filter(member__isnull=False).values('member').annotate(total=Sum('amount')).order_by()
creditors = transaction_totals.filter(total__gt=0)
debtors = transaction_totals.filter(total__lt=0)
total_credit = sum(row['total'] for row in creditors)
total_debt = sum(row['total'] for row in debtors)
return total_credit, total_debt
(@classmethod on TransactionManager)
Unfortunately I didn't manage to get it to do the second sum inside the
database as well, but this is fast enough - there's only a few hundred
members with non-zero balances. If you think there's any way this could be
improved / cleaned up, I'd be happy to hear:)
Thanks!
On Thursday, February 6, 2014 7:59:37 PM UTC, arnonym wrote:
>
> On Wed, 5 Feb 2014 10:11:29 -0800 (PST) ST
> <[email protected]<javascript:>>
>
> wrote:
> > Hi,
> >
> > I'm trying to optimize the run-time of getting total credit and debt
> > values out of our database. Ideally I'd like to formulate it as a
> > Django query. This is the raw SQL query I have, which produces the
> > right answer and is very fast (milliseconds):
> >
> > SELECT sum(tg.total) FROM
> > (
> > SELECT sum(t.amount) AS total, t.member_id AS member_id
> > FROM club_transaction AS t
> > WHERE t.member_id IS NOT NULL
> > GROUP BY t.member_id
> > ) AS tg
> > WHERE tg.total < 0
> >
> > (plus a second query for > 0)
> >
> > My Django implementation was:
> >
> > m =
> > Member.objects.annotate(balance=Sum('transaction__amount')) m_debt =
> > m.filter(balance__lt=0).aggregate(total=Sum('balance')) m_credit =
> > m.filter(balance__gt=0).aggregate(total=Sum('balance'))
> >
> > which looks a lot nicer, is easier to understand and maintain.
> >
> > However, it results in the following SQL query (slightly redacted):
> >
> > SELECT SUM(balance) FROM
> > (
> > SELECT "club_member"."id" AS "id", {all the other fields},
> > SUM("club_transaction"."amount") AS "balance"
> > FROM "club_member"
> > LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" =
> > "auth_user"."id")
> > LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" =
> > "club_transaction"."member_id")
> > GROUP BY "club_member"."id", {all the other fields},
> > "auth_user"."last_name", "auth_user"."first_name"
> > HAVING SUM("club_transaction"."amount") < 0
> > ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC
> > ) subquery
> >
> > (again, plus another one for > 0)
> > which is very slow (almost 1.5 seconds).
> >
> > How can I construct a Django query which doesn't request (and group
> > by) all the unnecessary other fields ?
> > I already tried playing around with only() and values() but never got
> > it to work.
>
> I did something similar a few years back. Its next to impossible with
> django-orm, just do it in raw sql.
> The django-orm can't really do any advanced group_by clauses. And you
> can't give them with extra(). Well, you can give them but they won't be
> used;-)
>
> Doing two (or three) orm-queries and then joining the data in python
> will actually be slower then doing it all in hand-crafted sql.
>
> So just do the sql by hand. And then optimize by having three columns,
> one with the SUM(amount) if amount>0, one with the SUM(amount) if
> amount<0 and one with the SUM(amount). Total credits, total depts and
> balance all in one query (if possible)...
>
> - Arnold
>
--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/73bd4ddb-d6ee-461a-bed3-a3ee915b8859%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.