The problem is that I don't know how to fix this. Would be glad to do the work if someone helped design a way to do it.
Turns out that the OperandTypeChecker can mutate the RexNodes it receives as arguments, but I am not sure that this is "safe", because it looks like the API has been designed with immutability in mind. For example, these combinators that can combine multiple type-checkers seem to assume immutability. Mihai ________________________________ From: Julian Hyde <jhyde.apa...@gmail.com> Sent: Friday, September 6, 2024 12:59 PM To: dev@calcite.apache.org <dev@calcite.apache.org> Subject: Re: QUESTION: SUBSTRING implementation I totally agree. Julian > On Sep 6, 2024, at 12:57, Mihai Budiu <mbu...@gmail.com> wrote: > > A problem with the way Calcite handles such functions is that the > type-checker and the runtime are separate and do not communicate with each > other. In the type checking code that people usually write for a function > they write an OperandTypeChecker, which makes some assumptions about the > range of implicit casts that the function allows. For example, for SUBSTRING > such a type checker would allow any numeric types, or even string types, with > the assmption that they would be converted to integer. As a result of the > type-checking, the function is accepted, but the implicit casts that were > inferred are not inserted as explicit casts to the arguments of the RexCall. > > So when the runtime has to implement the function it has no idea about the > kinds of arguments that may be expected, and what it's supposed to do with > them. The runtime has to handle many more cases, e.g,, INTEGER, TINYINT, > DECIMAL, STRING. This makes it difficult to keep the runtime in sync with the > type-checker as people implement new versions of the many functions from > different dialects. Moreover, the runtime may incorrectly convert such > values. For example, if you call decimal.intValue() you are not checking for > overflow and not rounding in the same way a CAST(decimal AS INTEGER) may do. > > Ideally the type-checker for functions would be allowed to insert implicit > casts, this would simplify a lot the task of both generating code in the > runtime, and of implementing the runtime itself. > > For substring the type-checker could insert a cast from DECIMAL to INTEGER. > The runtime now only has to worry about INTEGER values. > > Mihai > ________________________________ > From: Julian Hyde <jhyde.apa...@gmail.com> > Sent: Friday, September 6, 2024 12:44 PM > To: dev@calcite.apache.org <dev@calcite.apache.org> > Subject: Re: QUESTION: SUBSTRING implementation > > The trouble is, we are going beyond the standard. The standard says > > The declared type of a <start position>, <string length>, > <regex occurrence>, or <regex capture group> > shall be exact numeric with scale 0 (zero). > > People often misunderstand the “shall” in standards. It is talking about the > SQL query, not the implementation. A standards-compliant query will only have > integer values of <start position>, and compliant implementation (such as > Calcite) must handle such queries. > > The standard doesn’t say anything about what should happen if the query is > not standards-compliant, viz <start position> is not an integer. In > particular, it doesn’t say we should fail when we receive such queries. But > we can do anything, like return -77 or rounding the integer up, or rounding > it down, and still be compliant. > > What we have decided to do in Calcite is to apply our usual rules for > converting decimal arguments to integers. I am saying let’s define those > ‘usual rules’, and apply it to all functions, whether they are in the SQL > standard, in a library (such as Postgres or BigQuery) or user-defined > functions. > > > Julian > > > >> On Sep 6, 2024, at 1:02 AM, stanilovsky evgeny <estanilovs...@gridgain.com> >> wrote: >> >> I understand you are talking about, but ... LEFT is just extension under >> standard functions collection and DB implementors are free here, they can >> write restrictions in documentation [1] or treat fractional types as integer >> implicitly. I don`t think we need to change (LEFT, RIGHT and so on string >> functions), they already restricted in documentation, while standard >> (substring, overlay) need to be improved. >> >> [1] >> https://learn.microsoft.com/en-us/sql/t-sql/functions/left-transact-sql?view=sql-server-ver16 >> >>> On Thu, 05 Sep 2024 21:07:27 +0300, Julian Hyde <jh...@apache.org> wrote: >>> >>> Thanks for taking the time to look up the SQL standard. But >>> unfortunately the SQL standard isn't much use here, because it doesn't >>> say much about implicit conversions - it leaves that to the >>> implementor, such as Calcite or Postgres. >>> >>> Calcite has its own rules for implicit conversions of arguments. It >>> also has lots of functions that take numeric and integer arguments. >>> Calcite's users expect that the conversion rules are consistent across >>> all functions. Picking one at random, LEFT(string, length) is enabled >>> in several libraries including Postgres. You should implement the >>> conversion for SUBSTRING in such a way that LEFT gets it for free. >>> >>> On Thu, Sep 5, 2024 at 7:20 AM stanilovsky evgeny >>> <estanilovs...@gridgain.com> wrote: >>>> >>>> Thank all for reply, in section : 6.18 <string value function> >>>> I see only OVERLAY function that not corresponds here too: >>>> >>>> <character overlay function> ::= >>>> OVERLAY <left paren> <character value expression> PLACING <character value >>>> expression> >>>> FROM <start position> [ FOR <string length> ] >>>> [ USING <char length units> ] <right paren> >>>> >>>> <start position> ::= <numeric value expression> >>>> <string length> ::= <numeric value expression> >>>> >>>> or Julian you are talking not only about this kind of functions ? >>>> In such a case it consumes a lot of time for such a check. >>>> >>>> Why we can`t move here sequentially ? Just fix operand types ? What >>>> generic approach you are talking about ? >>>> SqlSingleOperandTypeChecker STRING_INTEGER is used only in : SqlFunction >>>> REPEAT >>>> SqlSingleOperandTypeChecker STRING_INTEGER_INTEGER only in >>>> SqlSubstringFunction >>>> STRING_STRING_INTEGER, STRING_STRING_INTEGER_INTEGER only in >>>> SqlOverlayFunction >>>> >>>> >>>>> Is the desired behavior specific to the SUBSTRING function? Or should it >>>>> be generic, for all functions that have an argument of type INTEGER? >>>>> >>>>> If it’s generic, can you give some other functions as examples where we >>>>> would want this behavior. >>>>> >>>>> Also, if it’s generic, the code should probably not be part of the >>>>> SUBSTRING function. >>>>> >>>>> >>>>>> On Sep 4, 2024, at 8:31 AM, Norman Jordan >>>>>> <norman.jor...@improving.com.INVALID> wrote: >>>>>> >>>>>> This makes sense. Running a quick test with MySQL, I can see that >>>>>> decimal values do not give an error. It appears that MySQL will round a >>>>>> decimal value to the nearest integer value. >>>>>> ________________________________ >>>>>> From: stanilovsky evgeny <estanilovs...@gridgain.com> >>>>>> Sent: Wednesday, September 4, 2024 5:17 AM >>>>>> To: dev@calcite.apache.org <dev@calcite.apache.org> >>>>>> Subject: QUESTION: SUBSTRING implementation >>>>>> >>>>>> Hi all, i want to discuss current SUBSTRING func implementation. >>>>>> >>>>>> Lets take a standard and found: >>>>>> <character substring function> ::= >>>>>> SUBSTRING <left paren> <character value expression> FROM <start >>>>>> position> >>>>>> [ FOR <string length> ] <right paren> >>>>>> >>>>>> and further : <start position> ::= <numeric value expression> >>>>>> >>>>>> thus it not restrict <start position> for only integer types >>>>>> >>>>>> Calcite documentation says: >>>>>> SUBSTRING(string FROM integer FOR integer) (we see restrictions here) >>>>>> >>>>>> Lets dig deeper: >>>>>> Calcite implementation operands checker not restrict operands too : >>>>>> 1. OperandTypes.STRING_NUMERIC - (1 param: substring ('asd', 2)) (not >>>>>> restricted params) >>>>>> 2. OperandTypes.STRING_INTEGER_INTEGER - (2 params: substring ('asd', 2, >>>>>> 3)) (only integer) >>>>>> >>>>>> So if i call "SELECT SUBSTRING('asd', 1.2)" runtime exception will >>>>>> occur: >>>>>> java.lang.RuntimeException: while resolving method 'substring[class >>>>>> java.lang.String, class java.math.BigDecimal]' in class class >>>>>> org.apache.calcite.runtime.SqlFunctions >>>>>>> at >>>>>>> org.apache.calcite.adapter.enumerable.EnumUtils.call(EnumUtils.java:770) >>>>>>> at >>>>>>> org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.call(RexImpTable.java:2866) >>>>>>> at >>>>>>> org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.implementSafe(RexImpTable.java:2847) >>>>>> >>>>>> So i appeal to align (1 and 2 operands checker implementation, so for 2 >>>>>> operands it need: STRING_NUMERIC_NUMERIC) and append appropriate >>>>>> implementation (with will cut off fractional numeric part) into >>>>>> SqlFunctions. >>>>>> >>>>>> wdyt ? if there will be no objections i will fill an issue. >>>>>> >>>>>> thanks ! >>>>>> Warning: The sender of this message could not be validated and may not >>>>>> be the actual sender. >