[jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Mamta A. Satoor (Commented) (JIRA) Tue, 06 Dec 2011 21:49:14 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164165#comment-13164165
 ]


Mamta A. Satoor commented on DERBY-4631:
----------------------------------------

I debugged the code to figure out what (and where) are we doing in the code 
which causes us to give wrong results for join column in case of territory 
based database and right outer join with NATURAL or USING clause. As Knut 
pointed out earlier in this jira, as per the SQL spec, "the join columns in a 
natural join or in a named columns join should be added to the select list by 
coalescing the column from the left table with the column from the right table. 
" What I have found is that Derby decides to pick up join column's value from 
the left table when we are working with natural left outer join and it picks up 
the join column's value from the right table when we are working with natural 
right outer join. This is not a problem when we are dealing with non-territory 
based databases but the assumption to rely on just one table's join column is 
incorrect when working with territory based databases. Following is the test 
case I used for debugging which further explains Derby's current implementation.

connect 
'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
 
create table big(x varchar(5)); 
insert into big values 'A','B', null; 
create table small(x varchar(5)); 
insert into small values 'b','c', null; 
select * from small t1 natural left outer join big t2;
select * from small t1 natural right outer join big t2;

For both natural left outer join and natural right outer join, at execution 
time, we create a merged row which has columns merged from the left and right 
tables. The column(in my example, there is only one column)s in the select sql 
maps to a column from the merged row. The mapping is determined at the sql 
compile phase. 
In the case of 'select * from small t1 natural left outer join big t2', there 
will be three merged rows with 2 columns each
'b'  'B'
'c'  null
null null
And for natural left outer join, the generated code has column in the select 
SQL map to the first column in the merged row. This will always work fine even 
in a territory based database because as per the SQL standards, the column x 
should be equivalent to the return value of coalesce(t1.x, t2.x). Since we are 
working with left outer join, then if the first column in the merged row is 
null, then even the 2nd column(from the right table) will be null and hence it 
is ok to always pick up the value from the 1st column in the merged row. This 
mapping will always lead column x to have the same value as coalesce(t1.x, 
t2.x).

But for a territory based database, we can't count on a logic like that for 
natural right outer join. The way Derby works right now, the column x in the 
select always gets mapped to the 2nd column in the merged row. In the case of 
'select * from small t1 natural right outer join big t2', there will be three 
merged rows with 2 columns each
null 'A'
'b'  'B'
null null
And for natural right outer join, the generated code has column in the select 
SQL map to the second column in the merged row. This will work fine in a 
non-territory database, because if column 1 in the merged row has a non-null 
value, then it will always be the same value as the column 2 in the merged row. 
But in our example, with territor based database(with SECONDARY strength, 
meaning it is case insensitive comparison), values 'B' and 'b' are considered 
equal. Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd 
column of the merged row. For natural right outer join with the data given in 
the example above, 
coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of 
column x in the SELECT to the 2nd column in the merged row will return 'A', 'B' 
and null thus returning data which does not comply with SQL standard which says 
that column x's value should be the return value of coalesce(t1.x, t2.x). So it 
seems like may be we need some of kind projection in case of natural right 
outer join (rather than simple column mapping to the 2nd column which is what 
happens right now) so that we look at both the columns in the merged row to 
determine the value of column x.

Hope this explanation helps understand what Derby is doing internally and based 
on that, we can come up with some proposal to fix the issue.

                
> Wrong join column returned by right outer join with NATURAL or USING and 
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>              Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right 
> outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Reply via email to