Congratulations, btw, to the drill workshop attendees. Here are some surprising statistics:
30 signups. 28 or 29 attendees showed up. (unheard of in my experience) Essentially all attendees actually prepared for the workshop at a very high level. Almost all had drill downloaded and compiled and were ready to rock and roll. Two separate groups found this bug and a third group seemed to sense that something was amiss, but they didn't quite connect the dots. If this is the average calibre of the CWI community, then I expect some interesting questions tomorrow at the CWI talk on clustering. On Wed, Mar 20, 2013 at 4:56 AM, Alejandro Bellogin Kouki < [email protected]> wrote: > Hi all, > > this morning I attended the Drill workshop in Amsterdam, and as other > couple of people, my colleagues and I found a bug regarding the > simple_plan.json query. Its original output was: > { > "sales" : 109.71, > "typeCount" : 2, > "quantity" : 159, > "ppu" : 0.55 > } > { > "sales" : 184.25, > "typeCount" : 2, > "quantity" : 335, > "ppu" : 0.55 > } > > Notice that both "ppu" values are the same, whereas the value for the > first should be 0.69 (109.71/159) and for the second 0.55. > > So, after digging a little bit (maybe too much, considering the time of > writing this email) into the source code, I managed to generate the desired > output. For that, I have changed both the simple_plan.json and some code in > CollapsingAggregateROP that is incompatible with the current description in > the Apache Drill Plan Syntax. Mainly because of the latter, I preferred to > start some discussion here instead of in the JIRA ticket, but if you want > me to file the JIRA first, I will do it (please, take into account I am a > complete newbie). > > Well, basically my solution involves changing the aggregate operation as > follows (notice now it has a target): > op: "collapsingaggregate", > within: "ppusegment", > carryovers: [ "donuts.ppu" ], > target: "donuts.ppu", > aggregations: [ > { ref: "donuts.typeCount", expr: "count(1)" }, > { ref: "donuts.quantity", expr: "sum(quantity)" }, > { ref: "donuts.sales", expr: "sum(donuts.ppu * quantity)" } > ] > To make this works, I have had to ignore the fact that "*will draw the > carryover variables from a record where the target field references has a > true value*" [ADPS]. When no target is used, the carryover contains a > pointer to the wrong register in method writeOutputRecord (whereas in the > method consumeCurrent -- where the condition for target is checked -- the > instance of the register is the one of the current segment). > > I acknowledge that this solution is just a workaround, since it does not > comply with the ADPS, but I hope at least it serves to give some > indications about where (and how) a real solution could be found (i.e., the > carryovers should to be computed when they point to the actual register). > > Regards, > Alejandro > > PS: an alternative solution would be to ignore the carryovers from the > initial query plan and (somehow) be able to print also the ppu field in the > projection stage. > > -- > Alejandro Bellogin Kouki > > http://rincon.uam.es/dir?cw=**435275268554687<http://rincon.uam.es/dir?cw=435275268554687> > >
