Re: Default operator override in Apache Calcite

Vladimir Ozerov Thu, 14 Jul 2022 00:14:32 -0700

Hi Julian,

The motivation for the proposal is a requirement to change the way how some
operators, like PLUS or SUM, does the type inference and validation. I
understand you concern about the complexity. How would you suggest to
override the behavior of PLUS/SUM in the downstream project, provided that
the static operators from the SqlStdOperatorTable injected uncontrollably
on any optimization stage?


Regards,
Vladimir.

пн, 11 июл. 2022 г. в 22:42, Julian Hyde <[email protected]>:

> Remember the rule of Chesterton’s fence: before you change something you
> must know why it exists the way it is. Since you don’t cover it in your
> email, let’s review the purpose of SqlStdOperatorTable.
>
> As the ’Std’ indicates, it is intended to contain the operators that are
> defined by the SQL standard. And which presumably would be enabled in any
> configuration.
>
> Because these operators are standard, they are resolved statically (by
> referencing static field names such as SqlStdOperatorTable.AS) and can
> therefore be used in locations without an operator table, such as the
> parser.
>
> For many years SqlStdOperatorTable was the default location for any new
> operator, and a lot of crap ended up there. I have been trying to restore
> its old purpose by encouraging people to use SqlInternalOperatorTable (for
> operators that are not visible in SQL but are used internally) and
> SqlLibraryOperators (which are enabled for one or more libraries, e.g. for
> Oracle or BigQuery compatibility).
>
> The alternative to static resolution is lookup by name/syntax and overload
> resolution. This is performed by the validator and is complex and expensive.
>
> You are proposing to add a third means of lookup, perhaps a middle way
> that is more flexible than static lookup but not too complex or expensive.
> However, it is a third way, doesn’t remove the existing two ways, and
> therefore will probably make things more complex, albeit more customizable.
>
> I think you should start the discussion with the problem(s) you seek to
> solve. There may be other solutions. For example, we wanted to customize
> how the result type of standard aggregate functions (e.g. AVG) is derived
> [1], and one solution would have been to replace the AVG operator, but we
> instead added the policy to TypeSystem.
>
> There is merit to your suggestion of building on, and aligning with,
> SqlKind. I added it because I wanted to make simple things simple (i.e.
> prioritize the 80% case). However, it its granularity is sometimes too
> coarse. An instance of that arrived just today [2].
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-1945 <
> https://issues.apache.org/jira/browse/CALCITE-1945>
>
> [2] https://issues.apache.org/jira/browse/CALCITE-5207 <
> https://issues.apache.org/jira/browse/CALCITE-5207>
>
>
>
> > On Jul 10, 2022, at 9:41 PM, Yanjing Wang <[email protected]>
> wrote:
> >
> > +1, Thanks Vladimir for pointing the pains out and the solutions looks
> more
> > clean and extensible.
> >
> > Vladimir Ozerov <[email protected]> 于2022年7月9日周六 15:01写道：
> >
> >> Hi,
> >>
> >> Apache Calcite has a powerful but complicated and non-documented
> function
> >> library. Some projects may require overriding some of the existing
> >> operators to introduce custom type deduction, custom validation, etc.
> This
> >> includes the base arithmetic functions (e.g, disallow INT + VARCHAR),
> >> aggregate functions (e.g., custom precision extension), etc.
> >>
> >> One convenient way of doing this is to re-define the function in your
> >> custom operator table. However, this doesn't work because Apache Calcite
> >> core uses the direct references to SqlStdOperatorTable. It starts with
> the
> >> parser [1] and validator [2]. If you manage to inject your functions at
> >> this stage (e.g., using a custom validator implementation or a custom
> >> SqlVisitor), the sql-to-rel converter will overwrite your functions
> >> still [3]. And even when you get the RelNode, optimization rules would
> >> silently replace your custom functions with the default ones [4].
> >>
> >> Alternatively, you may try extending some base interface, such as the
> >> TypeCoercion, but this doesn't give fine-grained control over the
> function
> >> behavior because you have to retain the existing function definitions
> to do
> >> coercion works.
> >>
> >> A better solution might be is to abstract out the function references
> >> through some sort of "factory"/"resolver", somewhat similar to the one
> used
> >> to resolve user-provided operators. For instance, the user may pass an
> >> optional desired operator table to parser/validator/converter configs
> and
> >> RelOptCluster. Then the "core" codebase could be refactored to
> dereference
> >> functions by SqlKind instead of SqlStdOperatorTable. If some required
> >> function types are missing from the SqlKind enum, we can add them. The
> >> default behavior would delegate to SqlStdOperatorTable, so the existing
> >> apps would not be affected.
> >>
> >> A "small" problem is that there are ~1500 usages of the
> SqlStdOperatorTable
> >> in the "core" module, but most of the usages are very straightforward to
> >> replace.
> >>
> >> This way, we would ensure a consistent function resolution throughout
> all
> >> query optimization phases. WDYT?
> >>
> >> Regards,
> >> Vladimir.
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/codegen/templates/Parser.jj#L7164
> >> [2]
> >>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6788
> >> [3]
> >>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1438
> >> [4]
> >>
> >>
> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java#L357
> >>
>
>

Re: Default operator override in Apache Calcite

Reply via email to