Hi all,
this morning I attended the Drill workshop in Amsterdam, and as other
couple of people, my colleagues and I found a bug regarding the
simple_plan.json query. Its original output was:
{
"sales" : 109.71,
"typeCount" : 2,
"quantity" : 159,
"ppu" : 0.55
}
{
"sales" : 184.25,
"typeCount" : 2,
"quantity" : 335,
"ppu" : 0.55
}
Notice that both "ppu" values are the same, whereas the value for the
first should be 0.69 (109.71/159) and for the second 0.55.
So, after digging a little bit (maybe too much, considering the time of
writing this email) into the source code, I managed to generate the
desired output. For that, I have changed both the simple_plan.json and
some code in CollapsingAggregateROP that is incompatible with the
current description in the Apache Drill Plan Syntax. Mainly because of
the latter, I preferred to start some discussion here instead of in the
JIRA ticket, but if you want me to file the JIRA first, I will do it
(please, take into account I am a complete newbie).
Well, basically my solution involves changing the aggregate operation as
follows (notice now it has a target):
op: "collapsingaggregate",
within: "ppusegment",
carryovers: [ "donuts.ppu" ],
target: "donuts.ppu",
aggregations: [
{ ref: "donuts.typeCount", expr: "count(1)" },
{ ref: "donuts.quantity", expr: "sum(quantity)" },
{ ref: "donuts.sales", expr: "sum(donuts.ppu * quantity)" }
]
To make this works, I have had to ignore the fact that "*will draw the
carryover variables from a record where the target field references has
a true value*" [ADPS]. When no target is used, the carryover contains a
pointer to the wrong register in method writeOutputRecord (whereas in
the method consumeCurrent -- where the condition for target is checked
-- the instance of the register is the one of the current segment).
I acknowledge that this solution is just a workaround, since it does not
comply with the ADPS, but I hope at least it serves to give some
indications about where (and how) a real solution could be found (i.e.,
the carryovers should to be computed when they point to the actual
register).
Regards,
Alejandro
PS: an alternative solution would be to ignore the carryovers from the
initial query plan and (somehow) be able to print also the ppu field in
the projection stage.
--
Alejandro Bellogin Kouki
http://rincon.uam.es/dir?cw=435275268554687