So two questions:
1. Are nested aggregates permitted? The grammar says yes so I'm assuming yes
2. Is there a bug in ARQ's implementation of this?
1 - yes legal syntax ... that can be changed :-) I'm going to pass this
on the WG.
2 - What ARQ does is to calculate the aggregates of a group as the group
is seem, not at the end of the group block when all the elements are known.
A line of argument is that the expression inside the aggregate is
applied to each row, so only row variables are in-scope. The aggregate
AVG(max(?x)+1) is violating that. So for streaming and for scoping, I'd
argue it's wrong - is there a use case that argues for it?
The spec needs clarifying if it is to be bad syntax. The simple
solution is no nested aggregate expressions.
Here is a related example:
SELECT (max(?x) As ?M) (avg(?M+1) AS ?A)
because the select expression rules say you can use ?M inside AVG().
Now sure what SQL says about this - the SPARQL processing model is the
same framework as SQL.
The failure in ARQ is because AVG is calculating the sum on each row of
a group as each row comes it (and the row can be thorwn away afterward -
just the key and aggregators need be kept) so it's before MAX() is ready
overall and even possibly before it has been called even with the
current row (undefined ordering).
Andy
On 01/06/12 18:15, Rob Vesse wrote:
Just a thought - for those who may be interested here is a version of the
query that does work as expected with the current ARQ snapshot
It removes the GROUP BY so the query actually answers the question of
interest and moves the MAX() into a subquery to force evaluation order and
scope:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (AVG(?mealPrice * (1.0 + ?MaxTipPercent)) AS ?avgCostWithBestTip)
WHERE
{
?description rdf:mealPrice ?mealPrice .
{
SELECT (MAX(?mealTip / ?mealPrice) AS ?MaxTipPercent)
WHERE
{
?description rdf:mealPrice ?mealPrice .
?description rdf:mealTip ?mealTip .
}
}
}
Rob
On 6/1/12 10:11 AM, "Rob Vesse"<[email protected]> wrote:
Hey All
I have an interesting question about nested aggregates which was posed by
some colleagues that I'm trying to figure out.
The sample data is as follows:
@prefix ex:<http://example.org/meals#> .
[] ex:mealPrice 25 ; ex:mealTip 7 .
[] ex:mealPrice 50 ; ex:mealTip 10 .
[] ex:mealPrice 100 ; ex:mealTip 25 .
The question they were trying to answer is what would my meals cost on
average if one always tipped at their best percentage. The original
query they came up with was as follows:
PREFIX ex:<http://example.org/meals#>
SELECT (AVG(?mealPrice * (1.0 + MAX( ?mealTip / ?mealPrice))) AS
?avgCostWithBestTip)
WHERE {
?description ex:mealPrice ?mealPrice .
?description ex:mealTip ?mealTip .
} GROUP BY ?description
Now this looks reasonable enough but is in fact incorrect because they
added a spurious GROUP BY so it actually calculating the total price of
each individual meal if the query worked. (It works in dotNetRDF but
gives an incorrect answer due to a previously undiscovered scoping bug
with nested aggregates)
With ARQ at least this query doesn't work, the SPARQL algebra generated
looks semi-reasonable. The problem is that while it moves the inner MAX()
aggregate out to be evaluated before the outer AVG() it fails to then
substitute the ?.0 into the AVG leaving the original MAX in place and
this seems to lead to an evaluation failure in the AVG and so we get
unbound values for each result. (dotNetRDF gives bound values just the
values are incorrect due to a scoping issue)
(base<http://example/base/>
(prefix ((ex:<http://example.org/meals#>))
(project (?avgCostWithBestTip)
(extend ((?avgCostWithBestTip ?.1))
(group (?description) ((?.0 (max (/ ?mealTip ?mealPrice))) (?.1
(avg (* ?mealPrice (+ 1.0 (max (/ ?mealTip ?mealPrice)))))))
(quadpattern
(quad<urn:x-arq:DefaultGraphNode> ?description rdf:mealPrice
?mealPrice)
(quad<urn:x-arq:DefaultGraphNode> ?description rdf:mealTip
?mealTip)
))))))
Regardless of the correctness of the query wrt to the original question
(which is easily fixable by just stripping off the GROUP BY clause) it
still appears that ARQ is not generating entirely correct algebra here.
It looks like it is trying to do the right thing but only partially
succeeds.
So two questions:
1. Are nested aggregates permitted? The grammar says yes so I'm
assuming yes
2. Is there a bug in ARQ's implementation of this?
I'll poke around in the source code myself and maybe if it is a bug it's
an easy fix but I imagine Andy can answer this much faster than I can.
From what I've found so far it looks like ARQ does aim to intern and
reuse aggregates but it doesn't seem to be working properly in this case
so maybe some subtle bug that I can't see due to lack of knowledge of the
code :-S
Rob