Comments inline:

On 6/3/12 6:05 AM, "Andy Seaborne" <[email protected]> wrote:

>
>> So two  questions:
>>
>>   1.  Are nested aggregates permitted?  The grammar says yes so I'm
>>assuming yes
>>   2.  Is there a bug in ARQ's implementation of this?
>
>
>1 - yes legal syntax ... that can be changed :-)  I'm going to pass this
>on the WG.

I'm not sure if they necessarily need forbidding, I think they may have
some uses

>
>2 - What ARQ does is to calculate the aggregates of a group as the group
>is seem, not at the end of the group block when all the elements are
>known.
>
>A line of argument is that the expression inside the aggregate is
>applied to each row, so only row variables are in-scope.  The aggregate
>AVG(max(?x)+1) is violating that.  So for streaming and for scoping, I'd
>argue it's wrong - is there a use case that argues for it?

Yes, the reason this works easily in dotNetRDF (after I fixed the scoping
bug) was that dotNetRDF always calculates the full result at every stage
(with some special exceptions for some ASK and LIMITed queries) so when
applying the aggregates all the groups have been calculated and then
aggregates are applied afterwards.

I think because of the potential scoping confusion and the fact that
allowing nesting may make implementation much harder for streaming
implementations at ARQ it is certainly worth the working group discussing
this.  Do you want me to make a formal comment to the working group?

>
>The spec needs clarifying if it is to be bad syntax.  The simple
>solution is no nested aggregate expressions.
>
>Here is a related example:
>
>SELECT (max(?x) As ?M) (avg(?M+1) AS ?A)
>
>because the select expression rules say you can use ?M inside AVG().

I tried that but it didn't work in ARQ either so not sure if that is a bug

>
>Now sure what SQL says about this - the SPARQL processing model is the
>same framework as SQL.
>
>The failure in ARQ is because AVG is calculating the sum on each row of
>a group as each row comes it (and the row can be thorwn away afterward -
>just the key and aggregators need be kept) so it's before MAX() is ready
>overall and even possibly before it has been called even with the
>current row (undefined ordering).

As I noted above even separating out the aggregates and using the variable
for the max inside the average seemed not to work so may be another ARQ
issue?

Rob

>
>       Andy
>
>On 01/06/12 18:15, Rob Vesse wrote:
>> Just a thought - for those who may be interested here is a version of
>>the
>> query that does work as expected with the current ARQ snapshot
>>
>> It removes the GROUP BY so the query actually answers the question of
>> interest and moves the MAX() into a subquery to force evaluation order
>>and
>> scope:
>>
>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> SELECT (AVG(?mealPrice * (1.0 + ?MaxTipPercent)) AS ?avgCostWithBestTip)
>> WHERE
>> {
>>    ?description rdf:mealPrice ?mealPrice .
>>    {
>>      SELECT (MAX(?mealTip / ?mealPrice) AS ?MaxTipPercent)
>>      WHERE
>>      {
>>        ?description rdf:mealPrice ?mealPrice .
>>        ?description rdf:mealTip ?mealTip .
>>      }
>>    }
>> }
>>
>> Rob
>>
>>
>>
>>
>> On 6/1/12 10:11 AM, "Rob Vesse"<[email protected]>  wrote:
>>
>>> Hey All
>>>
>>> I have an interesting question about nested aggregates which was posed
>>>by
>>> some colleagues that I'm trying to figure out.
>>>
>>> The sample data is as follows:
>>>
>>> @prefix ex:<http://example.org/meals#>  .
>>>
>>> [] ex:mealPrice 25 ; ex:mealTip 7 .
>>> [] ex:mealPrice 50 ; ex:mealTip 10 .
>>> [] ex:mealPrice 100 ; ex:mealTip 25 .
>>>
>>> The question they were trying to answer is what would my meals cost on
>>> average if one always tipped at their best percentage.  The original
>>> query they came up with was as follows:
>>>
>>> PREFIX ex:<http://example.org/meals#>
>>> SELECT (AVG(?mealPrice * (1.0 + MAX( ?mealTip / ?mealPrice))) AS
>>> ?avgCostWithBestTip)
>>> WHERE {
>>>   ?description ex:mealPrice ?mealPrice .
>>>   ?description ex:mealTip ?mealTip .
>>> } GROUP BY ?description
>>>
>>> Now this looks reasonable enough but is in fact incorrect because they
>>> added a spurious GROUP BY so it actually calculating the total price of
>>> each individual meal if the query worked.  (It works in dotNetRDF but
>>> gives an incorrect answer due to a previously undiscovered scoping bug
>>> with nested aggregates)
>>>
>>> With ARQ at least this query doesn't work, the SPARQL algebra generated
>>> looks semi-reasonable. The problem is that while it moves the inner
>>>MAX()
>>> aggregate out to be evaluated before the outer AVG() it fails to then
>>> substitute the ?.0 into the AVG leaving the original MAX in place and
>>> this seems to lead to an evaluation failure in the AVG and so we get
>>> unbound values for each result.  (dotNetRDF gives bound values just the
>>> values are incorrect due to a scoping issue)
>>>
>>> (base<http://example/base/>
>>>   (prefix ((ex:<http://example.org/meals#>))
>>>     (project (?avgCostWithBestTip)
>>>       (extend ((?avgCostWithBestTip ?.1))
>>>         (group (?description) ((?.0 (max (/ ?mealTip ?mealPrice))) (?.1
>>> (avg (* ?mealPrice (+ 1.0 (max (/ ?mealTip ?mealPrice)))))))
>>>           (quadpattern
>>>             (quad<urn:x-arq:DefaultGraphNode>  ?description
>>>rdf:mealPrice
>>> ?mealPrice)
>>>             (quad<urn:x-arq:DefaultGraphNode>  ?description rdf:mealTip
>>> ?mealTip)
>>>           ))))))
>>>
>>> Regardless of the correctness of the query wrt to the original question
>>> (which is easily fixable by just stripping off the GROUP BY clause) it
>>> still appears that ARQ is not generating entirely correct algebra here.
>>> It looks like it is trying to do the right thing but only partially
>>> succeeds.
>>>
>>> So two  questions:
>>>
>>>   1.  Are nested aggregates permitted?  The grammar says yes so I'm
>>> assuming yes
>>>   2.  Is there a bug in ARQ's implementation of this?
>>>
>>> I'll poke around in the source code myself and maybe if it is a bug
>>>it's
>>> an easy fix but I imagine Andy can answer this much faster than I can.
>>> From what I've found so far it looks like ARQ does aim to intern and
>>> reuse aggregates but it doesn't seem to be working properly in this
>>>case
>>> so maybe some subtle bug that I can't see due to lack of knowledge of
>>>the
>>> code :-S
>>>
>>> Rob
>>
>

Reply via email to