Hey All

I have an interesting question about nested aggregates which was posed by some 
colleagues that I'm trying to figure out.

The sample data is as follows:

@prefix ex: <http://example.org/meals#> .

[] ex:mealPrice 25 ; ex:mealTip 7 .
[] ex:mealPrice 50 ; ex:mealTip 10 .
[] ex:mealPrice 100 ; ex:mealTip 25 .

The question they were trying to answer is what would my meals cost on average 
if one always tipped at their best percentage.  The original query they came up 
with was as follows:

PREFIX ex: <http://example.org/meals#>
SELECT (AVG(?mealPrice * (1.0 + MAX( ?mealTip / ?mealPrice))) AS 
?avgCostWithBestTip)
WHERE {
  ?description ex:mealPrice ?mealPrice .
  ?description ex:mealTip ?mealTip .
} GROUP BY ?description

Now this looks reasonable enough but is in fact incorrect because they added a 
spurious GROUP BY so it actually calculating the total price of each individual 
meal if the query worked.  (It works in dotNetRDF but gives an incorrect answer 
due to a previously undiscovered scoping bug with nested aggregates)

With ARQ at least this query doesn't work, the SPARQL algebra generated looks 
semi-reasonable. The problem is that while it moves the inner MAX() aggregate 
out to be evaluated before the outer AVG() it fails to then substitute the ?.0 
into the AVG leaving the original MAX in place and this seems to lead to an 
evaluation failure in the AVG and so we get unbound values for each result.  
(dotNetRDF gives bound values just the values are incorrect due to a scoping 
issue)

(base <http://example/base/>
  (prefix ((ex: <http://example.org/meals#>))
    (project (?avgCostWithBestTip)
      (extend ((?avgCostWithBestTip ?.1))
        (group (?description) ((?.0 (max (/ ?mealTip ?mealPrice))) (?.1 (avg (* 
?mealPrice (+ 1.0 (max (/ ?mealTip ?mealPrice)))))))
          (quadpattern
            (quad <urn:x-arq:DefaultGraphNode> ?description rdf:mealPrice 
?mealPrice)
            (quad <urn:x-arq:DefaultGraphNode> ?description rdf:mealTip 
?mealTip)
          ))))))

Regardless of the correctness of the query wrt to the original question (which 
is easily fixable by just stripping off the GROUP BY clause) it still appears 
that ARQ is not generating entirely correct algebra here.  It looks like it is 
trying to do the right thing but only partially succeeds.

So two  questions:

  1.  Are nested aggregates permitted?  The grammar says yes so I'm assuming yes
  2.  Is there a bug in ARQ's implementation of this?

I'll poke around in the source code myself and maybe if it is a bug it's an 
easy fix but I imagine Andy can answer this much faster than I can.  From what 
I've found so far it looks like ARQ does aim to intern and reuse aggregates but 
it doesn't seem to be working properly in this case so maybe some subtle bug 
that I can't see due to lack of knowledge of the code :-S

Rob

Reply via email to