> On Jun 4, 2015, at 2:53 PM, Vladimir Sitnikov <[email protected]>
> wrote:
>
>> A query with an Aggregate, and a Filter:
>
> Java code is broken there. It looks like a copy&paste error.
Thanks. I’ve fixed it.
>
>> Creating an expression in Calcite's relational algebra has always been too
> complicated
>
> 1) Can you please show "a complicated rule" being rewritten to this nice API?
> I do not want to diminish the value of the API, it might well serve its
> purpose.
It’s not going to make rules radically simpler. And that wasn’t the goal.
However, here are a few small advantages that might add up.
1. There are quite a few rules that have multiple factories (e.g. an
AggregateFactory and a FilterFActory) and they can be replaced by a single
RelBuilderProto. And we will not need to change the constructor of the rule
when it needs a new factory, say a ProjectFactory.
2. Rules might be simpler to debug if the mappings are written in terms of
field names rather than field ordinals. Now, I make it clear in the doc that
you should not rely on field names staying the same when a rel is transformed.
But it is safe to build a map of names to ordinals in a rule, because we are
dealing with a fixed rel, and its names are unique and not changing. A map
involving strings should be easier to debug.
3. There are lots of useful methods scattered all over the place. For example,
RelOptUtil.createFilter(RelNode, Iterable<? extends RexNode>) is great. It
combines the predicates intelligently, and skips the filter if the predicates
combine to true. But it doesn’t use a factory. Oops! Now we have to come back
in a couple of months and fix it…
Another example: RelOptUtil.createProject doesn’t use a factory. It always
returns a LogicalProject.
There are other useful methods in RexUtil, RexBuilder. Most people who have
never written a rule before don’t know where to find these, so and up doing
them long-hand, or doing them wrong.
There is a temptation to put this functionality into the constructors of each
RelNode sub-class, but you would then end up with lots of constructors, and
differing functionality between the differing variants (e.g. HiveProject,
MongoProject, LogicalProject).
RelBuilder does seem to be the Right Place to put all of (or most of) this
logic.
> 2) Adding correlations into the equations might be tricky.
> Try something like "select deptno from depts where exists (select null
> from emps where empno = deptno+1)”.
Yes, I am staying away from SQL extensions such as correlated sub-queries for
now.
> I have not faced a case when it was hard to pass all the required parameters.
> Usually, it is very tricky to figure out all those
> permute(fromMapping(toInversePermutation))) kind of field shuffling.
> Those code becomes a write-only code. It does not look like the
> builder solves this problem.
>
> Here's a random quote from the sources (~RelFieldTrimmer). Who on
> Earth can understand that? Well, someone can, but is complicated.
>
> // Offset due to the number of system fields having changed.
> Mapping mapping =
> Mappings.create(
> MappingType.INVERSE_SURJECTION,
> rowType.getFieldCount(),
> groupCount + indicatorCount + usedAggCallCount);
>
> final ImmutableBitSet newGroupSet =
> Mappings.apply(inputMapping, aggregate.getGroupSet());
>
> final ImmutableList<ImmutableBitSet> newGroupSets =
> ImmutableList.copyOf(
> Iterables.transform(aggregate.getGroupSets(),
> new Function<ImmutableBitSet, ImmutableBitSet>() {
> public ImmutableBitSet apply(ImmutableBitSet input) {
> return Mappings.apply(inputMapping, input);
> }
> }));
Agreed, this is swiss-watch code. Luckily very few people need to modify it,
and lots of people can use it without understanding it.
My main target for this feature is people who would like to use Calcite but
whose front-end language is not SQL. And if it makes rules a bit easier to
write, and a bit easier to re-use, that would be great also.
Julian