Hey Alejandro,

Thanks for identifying this problem and presenting a solution.  I
appreciate the solution but think that we should really
update CollapsingAggregateROP to correctly operate against the within
segment key, which sounds like it is the core issue.  Can you file a jira
against this and hopefully someone will pick up the correction of the
behavior.  Clearly we need to add some more tests to validate the full
output of the donuts example as it was, in fact, working correctly at one
point.  (I'm probably the one that broke it.)  If you wanted to add an
additional jira for CollapsingAggregateROP test cases and maybe make one or
two, that would be very helpful.

Thanks again,
Jacques


On Tue, Mar 19, 2013 at 8:56 PM, Alejandro Bellogin Kouki <
[email protected]> wrote:

> Hi all,
>
> this morning I attended the Drill workshop in Amsterdam, and as other
> couple of people, my colleagues and I found a bug regarding the
> simple_plan.json query. Its original output was:
> {
>  "sales" : 109.71,
>  "typeCount" : 2,
>  "quantity" : 159,
>  "ppu" : 0.55
> }
> {
>  "sales" : 184.25,
>  "typeCount" : 2,
>  "quantity" : 335,
>  "ppu" : 0.55
> }
>
> Notice that both "ppu" values are the same, whereas the value for the
> first should be 0.69 (109.71/159) and for the second 0.55.
>
> So, after digging a little bit (maybe too much, considering the time of
> writing this email) into the source code, I managed to generate the desired
> output. For that, I have changed both the simple_plan.json and some code in
> CollapsingAggregateROP that is incompatible with the current description in
> the Apache Drill Plan Syntax. Mainly because of the latter, I preferred to
> start some discussion here instead of in the JIRA ticket, but if you want
> me to file the JIRA first, I will do it (please, take into account I am a
> complete newbie).
>
> Well, basically my solution involves changing the aggregate operation as
> follows (notice now it has a target):
>          op: "collapsingaggregate",
>          within: "ppusegment",
>          carryovers: [ "donuts.ppu" ],
>          target: "donuts.ppu",
>          aggregations: [
>            { ref: "donuts.typeCount",  expr: "count(1)" },
>            { ref: "donuts.quantity",  expr: "sum(quantity)" },
>            { ref: "donuts.sales",  expr: "sum(donuts.ppu * quantity)" }
>          ]
> To make this works, I have had to ignore the fact that "*will draw the
> carryover variables from a record where the target field references has a
> true value*" [ADPS]. When no target is used, the carryover contains a
> pointer to the wrong register in method writeOutputRecord (whereas in the
> method consumeCurrent -- where the condition for target is checked -- the
> instance of the register is the one of the current segment).
>
> I acknowledge that this solution is just a workaround, since it does not
> comply with the ADPS, but I hope at least it serves to give some
> indications about where (and how) a real solution could be found (i.e., the
> carryovers should to be computed when they point to the actual register).
>
> Regards,
> Alejandro
>
> PS: an alternative solution would be to ignore the carryovers from the
> initial query plan and (somehow) be able to print also the ppu field in the
> projection stage.
>
> --
>  Alejandro Bellogin Kouki
>  
> http://rincon.uam.es/dir?cw=**435275268554687<http://rincon.uam.es/dir?cw=435275268554687>
>
>

Reply via email to