Hi all,

this morning I attended the Drill workshop in Amsterdam, and as other couple of people, my colleagues and I found a bug regarding the simple_plan.json query. Its original output was:
{
 "sales" : 109.71,
 "typeCount" : 2,
 "quantity" : 159,
 "ppu" : 0.55
}
{
 "sales" : 184.25,
 "typeCount" : 2,
 "quantity" : 335,
 "ppu" : 0.55
}

Notice that both "ppu" values are the same, whereas the value for the first should be 0.69 (109.71/159) and for the second 0.55.

So, after digging a little bit (maybe too much, considering the time of writing this email) into the source code, I managed to generate the desired output. For that, I have changed both the simple_plan.json and some code in CollapsingAggregateROP that is incompatible with the current description in the Apache Drill Plan Syntax. Mainly because of the latter, I preferred to start some discussion here instead of in the JIRA ticket, but if you want me to file the JIRA first, I will do it (please, take into account I am a complete newbie).

Well, basically my solution involves changing the aggregate operation as follows (notice now it has a target):
         op: "collapsingaggregate",
         within: "ppusegment",
         carryovers: [ "donuts.ppu" ],
         target: "donuts.ppu",
         aggregations: [
           { ref: "donuts.typeCount",  expr: "count(1)" },
           { ref: "donuts.quantity",  expr: "sum(quantity)" },
           { ref: "donuts.sales",  expr: "sum(donuts.ppu * quantity)" }
         ]
To make this works, I have had to ignore the fact that "*will draw the carryover variables from a record where the target field references has a true value*" [ADPS]. When no target is used, the carryover contains a pointer to the wrong register in method writeOutputRecord (whereas in the method consumeCurrent -- where the condition for target is checked -- the instance of the register is the one of the current segment).

I acknowledge that this solution is just a workaround, since it does not comply with the ADPS, but I hope at least it serves to give some indications about where (and how) a real solution could be found (i.e., the carryovers should to be computed when they point to the actual register).

Regards,
Alejandro

PS: an alternative solution would be to ignore the carryovers from the initial query plan and (somehow) be able to print also the ppu field in the projection stage.

--
 Alejandro Bellogin Kouki
 http://rincon.uam.es/dir?cw=435275268554687

Reply via email to