In relational algebra

  select sum(x + y)
  from (select x, y from t)

is identical to

  select sum(xy)
  from (select x + y as xy from t)

because relational algebra does not have “sub-queries”. So, the one thing your 
rewrite rule is accomplishing is converting

  case when product = ‘foo’ when sum(units) else 0 end

into

  sum(case when product = ‘foo’ then 0 else units end)

That rewrite seems valid, as long as units is not null and groups are never 
empty. (If a group is empty, as can happen in windowed aggregates, or all of 
its units values are null, then sum would return null not 0.)

In general, the rewrite seems to be to convert f(agg(x)) into agg(g(x)).

For example if f is “2 *” and agg is “sum”, then g would be “2 *”. Thus “2 * 
sum(units)” becomes “sum(2 * units)”. But as you see, after the rewrite we are 
doing more work (because we are multiplying more values by 2). So it’s valid 
but probably not beneficial. We’d have to be careful how we apply this rule.

I’d name this rule “ProjectAggregateTransposeRule”. To make get it into 
Calcite, you would need to log a JIRA case, convert your code from Kotlin into 
Java, write some test cases in RelOptRulesTest (including a case where the rule 
does NOT fire because null values would make it invalid), and create a PR. I’d 
want the code to be able to handle a bit more than “CASE … ELSE 0 END” applied 
to “SUM”. We want the rule to be extensible to handle other patterns.

Julian


> On Aug 24, 2018, at 7:09 AM, Z. S. <[email protected]> wrote:
> 
> Hi,
> 
> I'm interested in writing a rule to push certain aggregate rexes to a lower
> subqueries to be able to remove not needed group by statements. In SQL this
> would look by transforming:
> 
> select t2.id, sum(case when product = 'foo' then sum_units else 0 end)
> from ( select id, product, sum(units) AS sum_units from orders group by id,
> product ) t2
> group by t2.id
> 
> Into:
> select t2.id, sum(sum_units)
> from ( select id, sum(case when product = 'foo' then sum_units else 0 end)
> as sum_units from orders group by id ) t2 group by t2.id
> 
> I've been able to implement a rule for this specific example so I know in
> theory it's possible to make the rule generic. You can see the rule here:
> https://pastebin.com/G3x9CdAW
> 
> My question is if there's a better way to solve this? Are there any
> existing calcite tools that could be used? I tried looking at PushProjector
> class but it doesn't seem to be for this purpose? Is it OK to traverse the
> tree like I'm doing it by calling the getInput method and casting nodes to
> HepRelVertex or is there a better way to traverse the tree? How would you
> suggest implementing such a rule?
> 
> Thanks

Reply via email to