I had tried to find answers in the sql-92 manual, but have yet to find specific
rules on how predicate "IN" should behave for implicit datatype conversions.
From what I understood, implementations of implicit datatype conversion is
implementation-defined(in our case decided by calcite).
Implementation-defined elements:
...
The term implementation-defined is used to identify characteristics
that may differ between implementations, but that shall be
defined
for each particular implementation.
...
6) Subclause 4.6, "Type conversions and mixing of data types": When
converting between numeric data types, if least significant dig-
its are lost, then it is implementation-defined whether rounding
or truncation occurs.
Thus, I had used postgres as reference to compare between different
implementations. I do agree on sql: " select … where x in (v1, v2)" giving the
same results as sql: " select … where x = v1 or x = v2" too.
I tried a few quick tests in reflective schema (as I could not find docs on
Calcite implicit datatype conversion precedence list) and concluded (correct me
if I'm wrong) that for the "=" operator in Calcite the behaviors are as follow:
1. the lower precision datatype will be converted to higher precision
datatypes prior comparison
2. for comparisons between smallint, int and bigint, conversion will be
done on the RHS variable of the operator "=" prior comparison
So, sql: " select … where x in (v1, v2)" does not give the same results as
sql: " select … where x = v1 or x = v2" for the 1st behavior above, e.g (tests
with reflective schema):
SQL: explain plan for select * from T2 where CBINT in (CDBL,CINT)
| PLAN(VARCHAR) |
|_|EnumerableCalc(expr#0..6=[{inputs}], expr#7=[CAST($t3):JavaType(class
java.lang.Long)], expr#8=[=($t2, $t7)], expr#9=[CAST($t1):JavaType(class
java.lang.Long)], expr#10=[=($t2, $t9)], expr#11=[OR($t8, $t10)],
proj#0..6=[{exprs}], $condition=[$t11])
EnumerableTableScan(table=[[TEST, T2]])
|_|
SQL: explain plan for select * from T2 where CBINT = CDBL or CBINT = CINT
| PLAN(VARCHAR) |
|_|EnumerableCalc(expr#0..6=[{inputs}], expr#7=[CAST($t2):JavaType(class
java.lang.Double)], expr#8=[=($t7, $t3)], expr#9=[CAST($t1):JavaType(class
java.lang.Long)], expr#10=[=($t2, $t9)], expr#11=[OR($t8, $t10)],
proj#0..6=[{exprs}], $condition=[$t11])
EnumerableTableScan(table=[[TEST, T2]])
|_|
SQL: select * from "T2"
| CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) | V1(VARCHAR)
| V2(VARCHAR) | CTS(TIMESTAMP) |
|_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
|_|0|_|null|_|0|_|0.5|_|half|_|h|_|1970-01-01 00:00:00.0|_|
|_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
|_|2|_|2|_|2|_|null|_|two|_|t|_|1970-01-01 00:00:00.002|_|
|_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
|_|4|_|4|_|null|_|4.0|_|four|_|f|_|1970-01-01 00:00:00.004|_|
|_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
SQL: select * from T2 where CBINT in (CDBL,CINT)
| CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) | V1(VARCHAR)
| V2(VARCHAR) | CTS(TIMESTAMP) |
|_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
|_|0|_|null|_|0|_|0.5|_|half|_|h|_|1970-01-01 00:00:00.0|_|
|_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
|_|2|_|2|_|2|_|null|_|two|_|t|_|1970-01-01 00:00:00.002|_|
|_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
|_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
SQL: select * from T2 where CBINT = CDBL or CBINT = CINT
| CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) | V1(VARCHAR)
| V2(VARCHAR) | CTS(TIMESTAMP) |
|_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
|_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
|_|2|_|2|_|2|_|null|_|two|_|t|_|1970-01-01 00:00:00.002|_|
|_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
|_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
Thanks & Regards,
Yuri Au Yong
Software Engineer - NPM | [email protected] | Desk: +60 3 7663 8372
Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com
-----Original Message-----
From: Julian Hyde [mailto:[email protected]]
Sent: Thursday, October 29, 2015 9:22 AM
To: [email protected]
Subject: Re: Implicit DataType Conversion for IN predicate
A rule of thumb I have used is that
select … where x in (v1, v2)
should give the same result as
select … where x = v1 or x = v2
even taking into account implicit conversions of x, v1, and v2. Maybe this rule
of thumb is mistaken; I don’t know. It is interesting to see what Postgres
does, but the definitive answer is to look into the SQL standard.
What does the standard say on this matter?
Julian
> On Oct 25, 2015, at 7:17 PM, Yuri Au Yong <[email protected]> wrote:
>
> Hi,
>
> When using JDBC adapter with postgres db, I observed that the implicit
> DataType conversion for IN predicate is only done on the in value list and
> not according to a datatype precedence list. Is this expected?
>
> Consider T2:
>
> SQL: select * from "T2"
> | CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) |
> |V1(VARCHAR) | V2(VARCHAR) | CTS(TIMESTAMP) |
> |_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
> |_|0|_|null|_|0|_|0.5|_|half|_|h|_|1970-01-01 00:00:00.0|_|
> |_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
> |_|2|_|2|_|2|_|null|_|two|_|t|_|1970-01-01 00:00:00.002|_|
> |_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
> |_|4|_|4|_|null|_|4.0|_|four|_|f|_|1970-01-01 00:00:00.004|_|
> |_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
>
> I am able to get the same result when comparing “CBINT in (CDBL)” and
> “CDBL in (CBINT)” in postgres db
>
> When I apply the same query using JDBC Adaptor with postgres db, “CBINT in
> (CDBL)” will give different results from “CDBL in (CBINT)”
>
> SQL: explain plan for select * from T2 where CBINT in (CDBL)
> | PLAN(VARCHAR) |
> |_|EnumerableCalc(expr#0..6=[{inputs}],
> |expr#7=[CAST($t3):JavaType(class java.lang.Long)], expr#8=[=($t2,
> |$t7)], proj#0..6=[{exprs}], $condition=[$t8])
> EnumerableTableScan(table=[[TEST, T2]])
> |_|
> SQL: explain plan for select * from T2 where CDBL in (CBINT)
> | PLAN(VARCHAR) |
> |_|EnumerableCalc(expr#0..6=[{inputs}],
> |expr#7=[CAST($t2):JavaType(class java.lang.Double)], expr#8=[=($t3,
> |$t7)], proj#0..6=[{exprs}], $condition=[$t8])
> EnumerableTableScan(table=[[TEST, T2]])
> |_|
> SQL: select * from T2 where CBINT in (CDBL)
> | CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) |
> |V1(VARCHAR) | V2(VARCHAR) | CTS(TIMESTAMP) |
> |_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
> |_|0|_|null|_|0|_|0.5|_|half|_|h|_|1970-01-01 00:00:00.0|_|
> |_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
> |_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
> |_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
> selectSomething - SQL: select * from T2 where CDBL in (CBINT)
> | CSINT(SMALLINT) | CINT(INTEGER) | CBINT(BIGINT) | CDBL(DOUBLE) |
> |V1(VARCHAR) | V2(VARCHAR) | CTS(TIMESTAMP) |
> |_|0|_|0|_|0|_|0.0|_|zero|_|z|_|1970-01-01 00:00:00.0|_|
> |_|1|_|1|_|1|_|1.0|_|one|_|o|_|null|_|
> |_|3|_|3|_|3|_|3.0|_|three|_|null|_|1970-01-01 00:00:00.003|_|
> |_|5|_|5|_|5|_|5.0|_|null|_|n|_|1970-01-01 00:00:00.005|_|
>
> Thanks & Regards,
>
> Yuri Au Yong
> Software Engineer - NPM | [email protected]
> <mailto:[email protected]> | Desk: +60 3 7663 8372 Persistent
> Systems Ltd. | Partners in Innovation | www.persistentsys.com
> <http://www.persistentsys.com/>
> <https://tnpmsupport.persistentsys.com/training>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, you are not authorized to read, retain, copy,
> print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for virus
> infected mails.
>
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read, retain, copy, print, distribute or
use this message. If you have received this communication in error, please
notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.