Hi Julian, The motivation for the proposal is a requirement to change the way how some operators, like PLUS or SUM, does the type inference and validation. I understand you concern about the complexity. How would you suggest to override the behavior of PLUS/SUM in the downstream project, provided that the static operators from the SqlStdOperatorTable injected uncontrollably on any optimization stage?
Regards, Vladimir. пн, 11 июл. 2022 г. в 22:42, Julian Hyde <[email protected]>: > Remember the rule of Chesterton’s fence: before you change something you > must know why it exists the way it is. Since you don’t cover it in your > email, let’s review the purpose of SqlStdOperatorTable. > > As the ’Std’ indicates, it is intended to contain the operators that are > defined by the SQL standard. And which presumably would be enabled in any > configuration. > > Because these operators are standard, they are resolved statically (by > referencing static field names such as SqlStdOperatorTable.AS) and can > therefore be used in locations without an operator table, such as the > parser. > > For many years SqlStdOperatorTable was the default location for any new > operator, and a lot of crap ended up there. I have been trying to restore > its old purpose by encouraging people to use SqlInternalOperatorTable (for > operators that are not visible in SQL but are used internally) and > SqlLibraryOperators (which are enabled for one or more libraries, e.g. for > Oracle or BigQuery compatibility). > > The alternative to static resolution is lookup by name/syntax and overload > resolution. This is performed by the validator and is complex and expensive. > > You are proposing to add a third means of lookup, perhaps a middle way > that is more flexible than static lookup but not too complex or expensive. > However, it is a third way, doesn’t remove the existing two ways, and > therefore will probably make things more complex, albeit more customizable. > > I think you should start the discussion with the problem(s) you seek to > solve. There may be other solutions. For example, we wanted to customize > how the result type of standard aggregate functions (e.g. AVG) is derived > [1], and one solution would have been to replace the AVG operator, but we > instead added the policy to TypeSystem. > > There is merit to your suggestion of building on, and aligning with, > SqlKind. I added it because I wanted to make simple things simple (i.e. > prioritize the 80% case). However, it its granularity is sometimes too > coarse. An instance of that arrived just today [2]. > > Julian > > [1] https://issues.apache.org/jira/browse/CALCITE-1945 < > https://issues.apache.org/jira/browse/CALCITE-1945> > > [2] https://issues.apache.org/jira/browse/CALCITE-5207 < > https://issues.apache.org/jira/browse/CALCITE-5207> > > > > > On Jul 10, 2022, at 9:41 PM, Yanjing Wang <[email protected]> > wrote: > > > > +1, Thanks Vladimir for pointing the pains out and the solutions looks > more > > clean and extensible. > > > > Vladimir Ozerov <[email protected]> 于2022年7月9日周六 15:01写道: > > > >> Hi, > >> > >> Apache Calcite has a powerful but complicated and non-documented > function > >> library. Some projects may require overriding some of the existing > >> operators to introduce custom type deduction, custom validation, etc. > This > >> includes the base arithmetic functions (e.g, disallow INT + VARCHAR), > >> aggregate functions (e.g., custom precision extension), etc. > >> > >> One convenient way of doing this is to re-define the function in your > >> custom operator table. However, this doesn't work because Apache Calcite > >> core uses the direct references to SqlStdOperatorTable. It starts with > the > >> parser [1] and validator [2]. If you manage to inject your functions at > >> this stage (e.g., using a custom validator implementation or a custom > >> SqlVisitor), the sql-to-rel converter will overwrite your functions > >> still [3]. And even when you get the RelNode, optimization rules would > >> silently replace your custom functions with the default ones [4]. > >> > >> Alternatively, you may try extending some base interface, such as the > >> TypeCoercion, but this doesn't give fine-grained control over the > function > >> behavior because you have to retain the existing function definitions > to do > >> coercion works. > >> > >> A better solution might be is to abstract out the function references > >> through some sort of "factory"/"resolver", somewhat similar to the one > used > >> to resolve user-provided operators. For instance, the user may pass an > >> optional desired operator table to parser/validator/converter configs > and > >> RelOptCluster. Then the "core" codebase could be refactored to > dereference > >> functions by SqlKind instead of SqlStdOperatorTable. If some required > >> function types are missing from the SqlKind enum, we can add them. The > >> default behavior would delegate to SqlStdOperatorTable, so the existing > >> apps would not be affected. > >> > >> A "small" problem is that there are ~1500 usages of the > SqlStdOperatorTable > >> in the "core" module, but most of the usages are very straightforward to > >> replace. > >> > >> This way, we would ensure a consistent function resolution throughout > all > >> query optimization phases. WDYT? > >> > >> Regards, > >> Vladimir. > >> > >> [1] > >> > >> > https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/codegen/templates/Parser.jj#L7164 > >> [2] > >> > >> > https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6788 > >> [3] > >> > >> > https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1438 > >> [4] > >> > >> > https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java#L357 > >> > >
