Re: Default operator override in Apache Calcite

Julian Hyde Mon, 11 Jul 2022 12:42:17 -0700

Remember the rule of Chesterton’s fence: before you change something you must 
know why it exists the way it is. Since you don’t cover it in your email, let’s 
review the purpose of SqlStdOperatorTable.

As the ’Std’ indicates, it is intended to contain the operators that are 
defined by the SQL standard. And which presumably would be enabled in any 
configuration.

Because these operators are standard, they are resolved statically (by 
referencing static field names such as SqlStdOperatorTable.AS) and can 
therefore be used in locations without an operator table, such as the parser.

For many years SqlStdOperatorTable was the default location for any new 
operator, and a lot of crap ended up there. I have been trying to restore its 
old purpose by encouraging people to use SqlInternalOperatorTable (for 
operators that are not visible in SQL but are used internally) and 
SqlLibraryOperators (which are enabled for one or more libraries, e.g. for 
Oracle or BigQuery compatibility).

The alternative to static resolution is lookup by name/syntax and overload 
resolution. This is performed by the validator and is complex and expensive.

You are proposing to add a third means of lookup, perhaps a middle way that is 
more flexible than static lookup but not too complex or expensive. However, it 
is a third way, doesn’t remove the existing two ways, and therefore will 
probably make things more complex, albeit more customizable.

I think you should start the discussion with the problem(s) you seek to solve. 
There may be other solutions. For example, we wanted to customize how the 
result type of standard aggregate functions (e.g. AVG) is derived [1], and one 
solution would have been to replace the AVG operator, but we instead added the 
policy to TypeSystem.

There is merit to your suggestion of building on, and aligning with, SqlKind. I 
added it because I wanted to make simple things simple (i.e. prioritize the 80% 
case). However, it its granularity is sometimes too coarse. An instance of that 
arrived just today [2].

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-1945 
<https://issues.apache.org/jira/browse/CALCITE-1945>

[2] https://issues.apache.org/jira/browse/CALCITE-5207 
<https://issues.apache.org/jira/browse/CALCITE-5207> 

> On Jul 10, 2022, at 9:41 PM, Yanjing Wang <[email protected]> wrote:
> 
> +1, Thanks Vladimir for pointing the pains out and the solutions looks more
> clean and extensible.
> 
> Vladimir Ozerov <[email protected]> 于2022年7月9日周六 15:01写道：
> 
>> Hi,
>> 
>> Apache Calcite has a powerful but complicated and non-documented function
>> library. Some projects may require overriding some of the existing
>> operators to introduce custom type deduction, custom validation, etc. This
>> includes the base arithmetic functions (e.g, disallow INT + VARCHAR),
>> aggregate functions (e.g., custom precision extension), etc.
>> 
>> One convenient way of doing this is to re-define the function in your
>> custom operator table. However, this doesn't work because Apache Calcite
>> core uses the direct references to SqlStdOperatorTable. It starts with the
>> parser [1] and validator [2]. If you manage to inject your functions at
>> this stage (e.g., using a custom validator implementation or a custom
>> SqlVisitor), the sql-to-rel converter will overwrite your functions
>> still [3]. And even when you get the RelNode, optimization rules would
>> silently replace your custom functions with the default ones [4].
>> 
>> Alternatively, you may try extending some base interface, such as the
>> TypeCoercion, but this doesn't give fine-grained control over the function
>> behavior because you have to retain the existing function definitions to do
>> coercion works.
>> 
>> A better solution might be is to abstract out the function references
>> through some sort of "factory"/"resolver", somewhat similar to the one used
>> to resolve user-provided operators. For instance, the user may pass an
>> optional desired operator table to parser/validator/converter configs and
>> RelOptCluster. Then the "core" codebase could be refactored to dereference
>> functions by SqlKind instead of SqlStdOperatorTable. If some required
>> function types are missing from the SqlKind enum, we can add them. The
>> default behavior would delegate to SqlStdOperatorTable, so the existing
>> apps would not be affected.
>> 
>> A "small" problem is that there are ~1500 usages of the SqlStdOperatorTable
>> in the "core" module, but most of the usages are very straightforward to
>> replace.
>> 
>> This way, we would ensure a consistent function resolution throughout all
>> query optimization phases. WDYT?
>> 
>> Regards,
>> Vladimir.
>> 
>> [1]
>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/codegen/templates/Parser.jj#L7164
>> [2]
>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6788
>> [3]
>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1438
>> [4]
>> 
>> https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java#L357
>>

Re: Default operator override in Apache Calcite

Reply via email to