[jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Mamta A. Satoor (Commented) (JIRA) Fri, 13 Jan 2012 10:15:03 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185724#comment-13185724
 ]


Mamta A. Satoor commented on DERBY-4631:
----------------------------------------

Want to copy the commit comments from the revision r1230873. After the commit 
comments, I have proposal for 2 possible fixes for the problems described in 
the commit comments.
*************************************
DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING 
and territory-based collation

I am adding more tests for this jira to show the kind of joins and queries that 
are impacted by this defect.

In short, any join query which is either a NATURAL join or has a USING clause 
can run into the two problems as described in this jira.

Problem 1)As per SQL spec, the join column's value should be derived by
               COALESCE(leftTable.joinColumn, rightTable.joinColumn).
       But Derby has it's own rule for deriving the join column's value. 
Derby's implementation, for a right outer join, gets the join column's value 
from the right table and for left outer join, it gets the value from the left 
table. This logic works for most cases, but it can give incorrect value for a 
right outer join (with NATURAL JOIN or USING clause) in case of a territory 
based database. Additionally, the join column in the SELECT list(with NATURAL 
JOIN or USING clause) gets associated with the left table's join column(for 
inner joins and left outer joins) or it gets associated with the right table's 
join column(for rihgt outer joins). Since SQL spec requires the join column to 
be COALESCE ( leftTable.C, rightTable.C ) AS C, the join column should not be 
really associated with any of the 2 join tables.

Problem 2)The Derby's assocation of join column to left or right table as 
described in problem 1) causes it to allow incorrect queries. eg query
select i from t1_D3880 inner join t2_D3880 USING(i) group by t1_D3880.i;
       The query above works because join column i got associated with left 
table which is t1_D3880. If the query was rewritten to do the group by on right 
table, it would fail.
select i from t1_D3880 inner join t2_D3880 USING(i) group by t2_D3880.i;
*************************************


There are 2 solutions that we have talked about in the jira. 
Solution 1)Currently, at execution time, Derby creates a merged row for join 
column which has columns merged from the left and right tables and picks up the 
1st column's value for a left outer join and picks up the 2nd column's value 
for right outer join. (I don't remember debugging the inner join case but from 
it's behavior, it probably also picks up the value from the 1st column's value 
just like left outer join).
        Rather than this logic, we can change the code to pick the join 
column's value based on following logic 
1)if the left table's column value is null then pick up the right table's 
column's value. 
2)If the left table's column value is non-null, then pick up that value 
        We can implement this logic for all kinds of joins except the cross 
join(which doesn't allow NATURAL    JOIN or USING clause and hence we will 
never be able to run in the problem described by this jira or we can implement 
the logic only for right outer joins which is the only case which will run into 
problem because we do not implement COALESCE.  The problem with this solution 
is it will not catch Problem 2) described above because join column is getting 
associated with left table or right table depending on what kind of join we are 
working with.
Solution 2)This solution will take care of all the problems described earlier. 
With this solution, at bind time, we should replace the join column with 
COALESCE as described by SQL spec. This is a cleaner  solution then solution 1) 
because it takes care of all the problem cases but it can cause existing 
queries with group by using table name for association for join columns will 
stop working and will have to be rewritten. 

I lean towards solution 2) for it's cleanliness and direct implementation of 
SQL spec COALESCE behavior for join columns but I am interested to know what 
are the community's thoughts on this.

                
> Wrong join column returned by right outer join with NATURAL or USING and 
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Mamta A. Satoor
>              Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right 
> outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Reply via email to