got it, thanks for clarifications, seems it need additional research.

I think that it is useful to say that a conversion is necessary, and also what is the target type of the conversion. For example, in bigquery everything is BIGINT, and the functions should convert other argument types to BIGINT (see the SAFE_* functions). In other dialects the default type of arguments is INTEGER. This information is not available after the type checker is finished.

It's also conceivable that a function has truly overloaded forms, where it does different things depending on the argument types, but that's the rare case.

This is a form of coercion really, but it's function-dependent. Ideally the existing coercion mechanism would take care of this.

Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 3:24 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation

We intentionally separate validation from sql-to-rel. That has many benefits (such as discouraging us from mutating the AST while validating, or thinking about RelNodes and RexNodes), but one problem is that we end up doing some things twice.

OperandTypeChecker (or something like it) needs to be able to say ‘it’s ok to pass a decimal(5, 2) value to an integer argument, because we can do an implicit conversion’. Validation will succeed, and at sql-to-rel time we should recover (or regenerate) the verdict that an implicit conversion is needed.

Do we need to make a note of which conversion? I don’t know. Maybe it’s enough to know that the argument and parameter type are different, because maybe for a given source-target type pair only one conversion is possible.

I have noticed a profusion of overloaded methods in SqlFunctions.java (e.g. log has 4 overloads - one for each combination of double and BigDecimal, and the BigDecimal variants ultimately just convert the BigDecimal to a double) and maybe we can start to reduce that profusion.

Julian


On Sep 6, 2024, at 1:11 PM, Mihai Budiu <mbu...@gmail.com> wrote:

The problem is that I don't know how to fix this. Would be glad to do the work if someone helped design a way to do it.

Turns out that the OperandTypeChecker can mutate the RexNodes it receives as arguments, but I am not sure that this is "safe", because it looks like the API has been designed with immutability in mind. For example, these combinators that can combine multiple type-checkers seem to assume immutability.

Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 12:59 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation

I totally agree.

Julian

On Sep 6, 2024, at 12:57, Mihai Budiu <mbu...@gmail.com> wrote:

A problem with the way Calcite handles such functions is that the type-checker and the runtime are separate and do not communicate with each other. In the type checking code that people usually write for a function they write an OperandTypeChecker, which makes some assumptions about the range of implicit casts that the function allows. For example, for SUBSTRING such a type checker would allow any numeric types, or even string types, with the assmption that they would be converted to integer. As a result of the type-checking, the function is accepted, but the implicit casts that were inferred are not inserted as explicit casts to the arguments of the RexCall.

So when the runtime has to implement the function it has no idea about the kinds of arguments that may be expected, and what it's supposed to do with them. The runtime has to handle many more cases, e.g,, INTEGER, TINYINT, DECIMAL, STRING. This makes it difficult to keep the runtime in sync with the type-checker as people implement new versions of the many functions from different dialects. Moreover, the runtime may incorrectly convert such values. For example, if you call decimal.intValue() you are not checking for overflow and not rounding in the same way a CAST(decimal AS INTEGER) may do.

Ideally the type-checker for functions would be allowed to insert implicit casts, this would simplify a lot the task of both generating code in the runtime, and of implementing the runtime itself.

For substring the type-checker could insert a cast from DECIMAL to INTEGER. The runtime now only has to worry about INTEGER values.

Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 12:44 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation

The trouble is, we are going beyond the standard. The standard says

The declared type of a <start position>, <string length>,
<regex occurrence>, or <regex capture group>
shall be exact numeric with scale 0 (zero).

People often misunderstand the “shall” in standards. It is talking about the SQL query, not the implementation. A standards-compliant query will only have integer values of <start position>, and compliant implementation (such as Calcite) must handle such queries.

The standard doesn’t say anything about what should happen if the query is not standards-compliant, viz <start position> is not an integer. In particular, it doesn’t say we should fail when we receive such queries. But we can do anything, like return -77 or rounding the integer up, or rounding it down, and still be compliant.

What we have decided to do in Calcite is to apply our usual rules for converting decimal arguments to integers. I am saying let’s define those ‘usual rules’, and apply it to all functions, whether they are in the SQL standard, in a library (such as Postgres or BigQuery) or user-defined functions.


Julian



On Sep 6, 2024, at 1:02 AM, stanilovsky evgeny <estanilovs...@gridgain.com> wrote:

I understand you are talking about, but ... LEFT is just extension under standard functions collection and DB implementors are free here, they can write restrictions in documentation [1] or treat fractional types as integer implicitly. I don`t think we need to change (LEFT, RIGHT and so on string functions), they already restricted in documentation, while standard (substring, overlay) need to be improved.

[1] https://learn.microsoft.com/en-us/sql/t-sql/functions/left-transact-sql?view=sql-server-ver16

On Thu, 05 Sep 2024 21:07:27 +0300, Julian Hyde <jh...@apache.org> wrote:

Thanks for taking the time to look up the SQL standard. But
unfortunately the SQL standard isn't much use here, because it doesn't
say much about implicit conversions - it leaves that to the
implementor, such as Calcite or Postgres.

Calcite has its own rules for implicit conversions of arguments. It
also has lots of functions that take numeric and integer arguments.
Calcite's users expect that the conversion rules are consistent across
all functions. Picking one at random, LEFT(string, length) is enabled
in several libraries including Postgres. You should implement the
conversion for SUBSTRING in such a way that LEFT gets it for free.

On Thu, Sep 5, 2024 at 7:20 AM stanilovsky evgeny
<estanilovs...@gridgain.com> wrote:

Thank all for reply, in section : 6.18 <string value function>
I see only OVERLAY function that not corresponds here too:

<character overlay function> ::=
OVERLAY <left paren> <character value expression> PLACING <character value
expression>
FROM <start position> [ FOR <string length> ]
[ USING <char length units> ] <right paren>

<start position> ::= <numeric value expression>
<string length> ::= <numeric value expression>

or Julian you are talking not only about this kind of functions ?
In such a case it consumes a lot of time for such a check.

Why we can`t move here sequentially ? Just fix operand types ? What
generic approach you are talking about ?
SqlSingleOperandTypeChecker STRING_INTEGER is used only in : SqlFunction
REPEAT
SqlSingleOperandTypeChecker STRING_INTEGER_INTEGER only in
SqlSubstringFunction
STRING_STRING_INTEGER, STRING_STRING_INTEGER_INTEGER only in
SqlOverlayFunction


Is the desired behavior specific to the SUBSTRING function? Or should it be generic, for all functions that have an argument of type INTEGER?

If it’s generic, can you give some other functions as examples where we
would want this behavior.

Also, if it’s generic, the code should probably not be part of the
SUBSTRING function.


On Sep 4, 2024, at 8:31 AM, Norman Jordan
<norman.jor...@improving.com.INVALID> wrote:

This makes sense. Running a quick test with MySQL, I can see that
decimal values do not give an error. It appears that MySQL will round a
decimal value to the nearest integer value.
________________________________
From: stanilovsky evgeny <estanilovs...@gridgain.com>
Sent: Wednesday, September 4, 2024 5:17 AM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: QUESTION: SUBSTRING implementation

Hi all, i want to discuss current SUBSTRING func implementation.

Lets take a standard and found:
<character substring function> ::=
SUBSTRING <left paren> <character value expression> FROM <start
position>
[ FOR <string length> ] <right paren>

and further : <start position> ::= <numeric value expression>

thus it not restrict <start position> for only integer types

Calcite documentation says:
SUBSTRING(string FROM integer FOR integer) (we see restrictions here)

Lets dig deeper:
Calcite implementation operands checker not restrict operands too : 1. OperandTypes.STRING_NUMERIC - (1 param: substring ('asd', 2)) (not
restricted params)
2. OperandTypes.STRING_INTEGER_INTEGER - (2 params: substring ('asd', 2,
3)) (only integer)

So if i call "SELECT SUBSTRING('asd', 1.2)" runtime exception will
occur:
java.lang.RuntimeException: while resolving method 'substring[class
java.lang.String, class java.math.BigDecimal]' in class class
org.apache.calcite.runtime.SqlFunctions
           at
org.apache.calcite.adapter.enumerable.EnumUtils.call(EnumUtils.java:770)
           at
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.call(RexImpTable.java:2866)
           at
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.implementSafe(RexImpTable.java:2847)

So i appeal to align (1 and 2 operands checker implementation, so for 2
operands it need: STRING_NUMERIC_NUMERIC) and append appropriate
implementation (with will cut off fractional numeric part) into
SqlFunctions.

wdyt ? if there will be no objections i will fill an issue.

thanks !
Warning: The sender of this message could not be validated and may not
be the actual sender.

Reply via email to