[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535934#comment-13535934
 ] 

Knut Anders Hatlen commented on DERBY-6017:
-------------------------------------------

I've tried to interpret what the standard says. Here are the relevant parts 
I've found:

> 8.4 <in predicate> - Syntax Rules
>
> 2) Let IVL be an <in value list>.
> ( IVL )
> is equivalent to the <table value constructor>:
> ( VALUES IVL )

So, according to this rule, the following two queries should be equivalent 
(which they are not currently):

ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
9223372036854775807, 9.223372036854776E18);
X                   
--------------------
9223372036854775805 

1 row selected
ij> select * from t where x in (values 9223372036854775805, 
9223372036854775806, 9223372036854775807, 9.223372036854776E18);
X                   
--------------------
9223372036854775805 
9223372036854775806 
9223372036854775807 

3 rows selected

Furthermore, it says:

> 8.4 <in predicate> - Syntax Rules
>
> 5) The expression
> RVC IN IPV
> is equivalent to
> RVC = ANY IPV

So to find the correct semantics for IN, we need to rewrite the query to ANY. 
That is,

select * from t where x = any (values 9223372036854775805, 9223372036854775806, 
9223372036854775807, 9.223372036854776E18);

and see what the standard says about that. (This particular ANY query returns 
three rows in Derby, which is the same as the IN (VALUES ...) query above.)

This leads us to:

> 8.8 <quantified comparison predicate> - Syntax Rules
>
> 1) Let RV1 and RV2 be <row value predicand>s whose declared types are 
> respectively that of the <row value
> predicand> and the row type of the <table subquery>. The Syntax Rules of 
> Subclause 8.2, “<comparison
> predicate>”, are applied to:
> RV1 <comp op> RV2

That is, for the comparisons, the value on the right hand side should have the 
row type of the sub-query.

And the row type of our VALUES sub-query is DOUBLE (or at least some 
approximate numeric type) as 7.3 <table value constructor> says row type is 
determined by applying Subclause 9.3, “Data types of results of aggregations”, 
whose syntax rule 3d says:

> If any data type in DTS is approximate numeric, then each data type in DTS 
> shall be numeric and the
> result data type is approximate numeric with implementation-defined precision.

Derby does produce the right type for the <table value constructor>:

ij> values 9223372036854775805, 9223372036854775806, 9223372036854775807, 
9.223372036854776E18;
1                     
----------------------
9.223372036854776E18  
9.223372036854776E18  
9.223372036854776E18  
9.223372036854776E18  

4 rows selected

The ANY query should therefore end up like:

  select * from t where x = 9.223372036854776E18 or x = 9.223372036854776E18 or 
x = 9.223372036854776E18 or x = 9.223372036854776E18;

Or even simpler, because the DOUBLE representation of all four values happens 
to be the same:

  select * from t where x = 9.223372036854776E18;

Now, 8.2 <comparison predicate> - General Rules, says this:

> 2) Numbers are compared with respect to their algebraic value.

No more details than that, I'm afraid. And no mentioning about converting the 
operands to the dominant type, so far as I can see.

Derby currently returns these three rows for the query:

ij> select * from t where x = 9.223372036854776E18;
X                   
--------------------
9223372036854775805 
9223372036854775806 
9223372036854775807 

3 rows selected

I'm not completely convinced that all those three values have the same 
algebraic value as 9.223372036854776E18. But in any case I think changing how 
Derby performs numeric comparisons is outside the scope of this issue.

So how's this for a plan? In this issue, let's assume Derby's equality 
comparison operator does the right thing. The goal for now should be to make an 
<in value list> behave the same way as the ANY query the SQL standard says it 
should be equivalent to. We should have tests that use the results from the 
equivalent ANY queries as canons, and those tests would also alert us if we 
later make changes to the comparison operator in a way that makes ANY and IN 
behave inconsistently.
                
> IN lists with constants may return wrong results
> ------------------------------------------------
>
>                 Key: DERBY-6017
>                 URL: https://issues.apache.org/jira/browse/DERBY-6017
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.9.1.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X                   
> --------------------
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X                   
> --------------------
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to