On 3/2/06, Rock <[EMAIL PROTECTED]> wrote:
>
> My other changeset developed at this week's Django Sprint is much more
> extensive and is best explained by the new section that I am adding to
> the DB API docs.
>
> Feedback is welcome.

Truth be told, a desire to improve aggregates in Django is the reason
that I got involved a few months back (I have a use case that requires
pretty extensive use of aggregates).  I was going to wait until post
magic-removal merge to worry about any aggregate proposals - worry
about getting MR out the door first, then worry about other
improvements.

However, if you're going to start playing, I don't to miss out on any
fun :-) Here are some comments on your proposals (including the count
patch from the other thread):

1) I'm -1 on the get_aggregate[s] notation you are suggesting. To me,
as a name, 'get_aggregates' only makes sense as a same if you know
SQL, and the syntax requires end users to learn two notations for
aggregates - avg() and 'AVG', with the latter being very SQL, and not
very python/ORM.

If you are willing to use raw SQL (like get_aggregate
psuedo-requires), you can already do what you are proposing with
relative ease using a select kwarg. select kwargs also allows the
definition of aggregate and non-aggregate 'price+tax+tip' fields.

Also, your proposal doesn't cross-multiply very well - your proposal
is fine if I want 1-n summary stats of 1 field, but not if I want 1-n
summary stats of 1-n fields. (i.e., sum of field 1, avg of field 2).

2) I'm also -1 on the count modification you proposed.
Article.objects.all().count() reads quite obviously as the count of
all articles. However, Article.objects.all().count('title') reads (to
me) like an inelegant way of saying 'a count of all titles'. This
isn't what your proposal would return. What you are proposing can
already be acheived using
Article.objects.filter(fieldname__isnull=False).count(), which, IMHO,
reads better, and is more flexible.

3) I'm +0 on the min, max, avg, stddev and median (where available)
suggestions as you describe. They are all reasonable analogs of the
count() function, and I don't think it would necessarily hurt if they
were to be included. However, I have three things preventing me from
saying +1:

Firstly (and trivially), I would say average() not avg(). The other
abbreviations don't bother me, but for some reason avg() rubs me the
wrong way.

Secondly, in the absence of get_aggregates, if you want to get the
minimum AND the maximum, you need to make two separate SQL queries.
Obviously, this is why you suggested get_aggregates - I just think we
need a better syntax.

A couple of quick suggestions for an alternate syntax for multiple
aggregates in a single query (these ideas are in foetal form - like I
said earlier, I wasn't expecting to have this discussion for a while):

>>> Article.objects.summary(count=True, min='field', average=['field',
'field'], ...)
{'count' = 5, 'min'={'field':1}, 'average'={'field': 3, 'field': 4} ...}

i.e., a kwarg for each aggregate function; COUNT taking true/false,
and SUM/MIN/MAX/AVG/STDEV/MEDIAN taking a single field name, or a
tuple/list of field names. The return value is a dictionary populated
with the summary statistics.

Alternatively, the return value could also be an object:
>>> s = Article.objects.summary(...)
>>> s.count
5
>>> s.min['field']  # (or maybe s.field.min)
1

You could also rearrange the summary method to make the field names
the kwargs of the method:
>>> Article.objects.summary(field1=sum, field2=(sum,average),...)
if you make sum, average, etc functions that can be referenced as
objects. However, this makes count harder to implement (since it
doesn't need to operate on a field).

These options would all allow 1-n fields with 1-n aggregates, and
avoids 'SQL string' notation in ORM queries.

Thirdly, (and most significantly), while avg(), min(), max() etc are
reasonable representations, IMHO they only deal with a small part of
the aggregate story.

Your suggestions all deal with aggregates in the 'summary' sense -
give me an average of a field for these objects. However, in this
context, the field must belong to the object being averaged, and a
query produces either a summary value or a list of objects - you don't
produce both.

While this is important (especially for things like count), IMHO this
misses one very important use case of aggregates: those queries where
you need both a list of objects AND a summary value (or, to put it
another way, queries where the summary value isn't over the table
whose objects we are retreiving).

For example:

class Machine(Model):
   id = CharField()

class Part(Model):
   name = CharField()
   weight = FloatField()
   machine = ForeignKey(Machine)

What is the total weight of each machine? The machine consists of
several parts, and the total_weight is the sum of all weights of
related parts.

What you need to be able to do here is to annotate the results
returned by your model with some summary stats. You can currently do
this using select, tables and where kwargs on a filter; however, these
are somewhat complex to set up. IMHO select/where/tables is a
necessary evil; while it is required to allow spanning coverage of
SQL, we should be striving to minimize the use of select/where/tables,
especially when we can find an object-like subset of use cases.

So, I would be aiming to be able to do something like:

>>> m = Machine.objects.annotate('total_weight', part__weight=sum).all()
>>> m[0].total_weight
123

This would do the join required to get weight from the part related
objects, add an extra column to the results returned by SQL, and add
an attribute to objects with the provided alias. You can add multiple
annotations if so required, and filter (or otherwise process) an
annotated query set.

Unresolved issues that I would like to visit:
- queries over annotated fields
- summary stats that are used for queries, but not returned to the end user
- interaction of annotations and queries with group_by/having at the SQL level
- unifying notation for 'summary' and 'annotate'

Like I said earlier, I wasn't expecting to have this conversation for
a while, so my ideas are still a little underdeveloped, but I hope you
get an idea of where I am heading.

Russ Magee %-)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to