got it, thanks for clarifications, seems it need additional research.
I think that it is useful to say that a conversion is necessary, and
also what is the target type of the conversion. For example, in bigquery
everything is BIGINT, and the functions should convert other argument
types to BIGINT (see the SAFE_* functions). In other dialects the
default type of arguments is INTEGER. This information is not available
after the type checker is finished.
It's also conceivable that a function has truly overloaded forms, where
it does different things depending on the argument types, but that's the
rare case.
This is a form of coercion really, but it's function-dependent. Ideally
the existing coercion mechanism would take care of this.
Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 3:24 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation
We intentionally separate validation from sql-to-rel. That has many
benefits (such as discouraging us from mutating the AST while
validating, or thinking about RelNodes and RexNodes), but one problem is
that we end up doing some things twice.
OperandTypeChecker (or something like it) needs to be able to say ‘it’s
ok to pass a decimal(5, 2) value to an integer argument, because we can
do an implicit conversion’. Validation will succeed, and at sql-to-rel
time we should recover (or regenerate) the verdict that an implicit
conversion is needed.
Do we need to make a note of which conversion? I don’t know. Maybe it’s
enough to know that the argument and parameter type are different,
because maybe for a given source-target type pair only one conversion is
possible.
I have noticed a profusion of overloaded methods in SqlFunctions.java
(e.g. log has 4 overloads - one for each combination of double and
BigDecimal, and the BigDecimal variants ultimately just convert the
BigDecimal to a double) and maybe we can start to reduce that profusion.
Julian
On Sep 6, 2024, at 1:11 PM, Mihai Budiu <mbu...@gmail.com> wrote:
The problem is that I don't know how to fix this. Would be glad to do
the work if someone helped design a way to do it.
Turns out that the OperandTypeChecker can mutate the RexNodes it
receives as arguments, but I am not sure that this is "safe", because
it looks like the API has been designed with immutability in mind. For
example, these combinators that can combine multiple type-checkers seem
to assume immutability.
Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 12:59 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation
I totally agree.
Julian
On Sep 6, 2024, at 12:57, Mihai Budiu <mbu...@gmail.com> wrote:
A problem with the way Calcite handles such functions is that the
type-checker and the runtime are separate and do not communicate with
each other. In the type checking code that people usually write for a
function they write an OperandTypeChecker, which makes some
assumptions about the range of implicit casts that the function
allows. For example, for SUBSTRING such a type checker would allow any
numeric types, or even string types, with the assmption that they
would be converted to integer. As a result of the type-checking, the
function is accepted, but the implicit casts that were inferred are
not inserted as explicit casts to the arguments of the RexCall.
So when the runtime has to implement the function it has no idea about
the kinds of arguments that may be expected, and what it's supposed to
do with them. The runtime has to handle many more cases, e.g,,
INTEGER, TINYINT, DECIMAL, STRING. This makes it difficult to keep the
runtime in sync with the type-checker as people implement new versions
of the many functions from different dialects. Moreover, the runtime
may incorrectly convert such values. For example, if you call
decimal.intValue() you are not checking for overflow and not rounding
in the same way a CAST(decimal AS INTEGER) may do.
Ideally the type-checker for functions would be allowed to insert
implicit casts, this would simplify a lot the task of both generating
code in the runtime, and of implementing the runtime itself.
For substring the type-checker could insert a cast from DECIMAL to
INTEGER. The runtime now only has to worry about INTEGER values.
Mihai
________________________________
From: Julian Hyde <jhyde.apa...@gmail.com>
Sent: Friday, September 6, 2024 12:44 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: QUESTION: SUBSTRING implementation
The trouble is, we are going beyond the standard. The standard says
The declared type of a <start position>, <string length>,
<regex occurrence>, or <regex capture group>
shall be exact numeric with scale 0 (zero).
People often misunderstand the “shall” in standards. It is talking
about the SQL query, not the implementation. A standards-compliant
query will only have integer values of <start position>, and compliant
implementation (such as Calcite) must handle such queries.
The standard doesn’t say anything about what should happen if the
query is not standards-compliant, viz <start position> is not an
integer. In particular, it doesn’t say we should fail when we receive
such queries. But we can do anything, like return -77 or rounding the
integer up, or rounding it down, and still be compliant.
What we have decided to do in Calcite is to apply our usual rules for
converting decimal arguments to integers. I am saying let’s define
those ‘usual rules’, and apply it to all functions, whether they are
in the SQL standard, in a library (such as Postgres or BigQuery) or
user-defined functions.
Julian
On Sep 6, 2024, at 1:02 AM, stanilovsky evgeny
<estanilovs...@gridgain.com> wrote:
I understand you are talking about, but ... LEFT is just extension
under standard functions collection and DB implementors are free
here, they can write restrictions in documentation [1] or treat
fractional types as integer implicitly. I don`t think we need to
change (LEFT, RIGHT and so on string functions), they already
restricted in documentation, while standard (substring, overlay) need
to be improved.
[1]
https://learn.microsoft.com/en-us/sql/t-sql/functions/left-transact-sql?view=sql-server-ver16
On Thu, 05 Sep 2024 21:07:27 +0300, Julian Hyde <jh...@apache.org>
wrote:
Thanks for taking the time to look up the SQL standard. But
unfortunately the SQL standard isn't much use here, because it
doesn't
say much about implicit conversions - it leaves that to the
implementor, such as Calcite or Postgres.
Calcite has its own rules for implicit conversions of arguments. It
also has lots of functions that take numeric and integer arguments.
Calcite's users expect that the conversion rules are consistent
across
all functions. Picking one at random, LEFT(string, length) is enabled
in several libraries including Postgres. You should implement the
conversion for SUBSTRING in such a way that LEFT gets it for free.
On Thu, Sep 5, 2024 at 7:20 AM stanilovsky evgeny
<estanilovs...@gridgain.com> wrote:
Thank all for reply, in section : 6.18 <string value function>
I see only OVERLAY function that not corresponds here too:
<character overlay function> ::=
OVERLAY <left paren> <character value expression> PLACING
<character value
expression>
FROM <start position> [ FOR <string length> ]
[ USING <char length units> ] <right paren>
<start position> ::= <numeric value expression>
<string length> ::= <numeric value expression>
or Julian you are talking not only about this kind of functions ?
In such a case it consumes a lot of time for such a check.
Why we can`t move here sequentially ? Just fix operand types ? What
generic approach you are talking about ?
SqlSingleOperandTypeChecker STRING_INTEGER is used only in :
SqlFunction
REPEAT
SqlSingleOperandTypeChecker STRING_INTEGER_INTEGER only in
SqlSubstringFunction
STRING_STRING_INTEGER, STRING_STRING_INTEGER_INTEGER only in
SqlOverlayFunction
Is the desired behavior specific to the SUBSTRING function? Or
should it
be generic, for all functions that have an argument of type
INTEGER?
If it’s generic, can you give some other functions as examples
where we
would want this behavior.
Also, if it’s generic, the code should probably not be part of the
SUBSTRING function.
On Sep 4, 2024, at 8:31 AM, Norman Jordan
<norman.jor...@improving.com.INVALID> wrote:
This makes sense. Running a quick test with MySQL, I can see that
decimal values do not give an error. It appears that MySQL will
round a
decimal value to the nearest integer value.
________________________________
From: stanilovsky evgeny <estanilovs...@gridgain.com>
Sent: Wednesday, September 4, 2024 5:17 AM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: QUESTION: SUBSTRING implementation
Hi all, i want to discuss current SUBSTRING func implementation.
Lets take a standard and found:
<character substring function> ::=
SUBSTRING <left paren> <character value expression> FROM <start
position>
[ FOR <string length> ] <right paren>
and further : <start position> ::= <numeric value expression>
thus it not restrict <start position> for only integer types
Calcite documentation says:
SUBSTRING(string FROM integer FOR integer) (we see restrictions
here)
Lets dig deeper:
Calcite implementation operands checker not restrict operands too
:
1. OperandTypes.STRING_NUMERIC - (1 param: substring ('asd', 2))
(not
restricted params)
2. OperandTypes.STRING_INTEGER_INTEGER - (2 params: substring
('asd', 2,
3)) (only integer)
So if i call "SELECT SUBSTRING('asd', 1.2)" runtime exception will
occur:
java.lang.RuntimeException: while resolving method
'substring[class
java.lang.String, class java.math.BigDecimal]' in class class
org.apache.calcite.runtime.SqlFunctions
at
org.apache.calcite.adapter.enumerable.EnumUtils.call(EnumUtils.java:770)
at
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.call(RexImpTable.java:2866)
at
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.implementSafe(RexImpTable.java:2847)
So i appeal to align (1 and 2 operands checker implementation, so
for 2
operands it need: STRING_NUMERIC_NUMERIC) and append appropriate
implementation (with will cut off fractional numeric part) into
SqlFunctions.
wdyt ? if there will be no objections i will fill an issue.
thanks !
Warning: The sender of this message could not be validated and
may not
be the actual sender.