[
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164165#comment-13164165
]
Mamta A. Satoor commented on DERBY-4631:
----------------------------------------
I debugged the code to figure out what (and where) are we doing in the code
which causes us to give wrong results for join column in case of territory
based database and right outer join with NATURAL or USING clause. As Knut
pointed out earlier in this jira, as per the SQL spec, "the join columns in a
natural join or in a named columns join should be added to the select list by
coalescing the column from the left table with the column from the right table.
" What I have found is that Derby decides to pick up join column's value from
the left table when we are working with natural left outer join and it picks up
the join column's value from the right table when we are working with natural
right outer join. This is not a problem when we are dealing with non-territory
based databases but the assumption to rely on just one table's join column is
incorrect when working with territory based databases. Following is the test
case I used for debugging which further explains Derby's current implementation.
connect
'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
create table big(x varchar(5));
insert into big values 'A','B', null;
create table small(x varchar(5));
insert into small values 'b','c', null;
select * from small t1 natural left outer join big t2;
select * from small t1 natural right outer join big t2;
For both natural left outer join and natural right outer join, at execution
time, we create a merged row which has columns merged from the left and right
tables. The column(in my example, there is only one column)s in the select sql
maps to a column from the merged row. The mapping is determined at the sql
compile phase.
In the case of 'select * from small t1 natural left outer join big t2', there
will be three merged rows with 2 columns each
'b' 'B'
'c' null
null null
And for natural left outer join, the generated code has column in the select
SQL map to the first column in the merged row. This will always work fine even
in a territory based database because as per the SQL standards, the column x
should be equivalent to the return value of coalesce(t1.x, t2.x). Since we are
working with left outer join, then if the first column in the merged row is
null, then even the 2nd column(from the right table) will be null and hence it
is ok to always pick up the value from the 1st column in the merged row. This
mapping will always lead column x to have the same value as coalesce(t1.x,
t2.x).
But for a territory based database, we can't count on a logic like that for
natural right outer join. The way Derby works right now, the column x in the
select always gets mapped to the 2nd column in the merged row. In the case of
'select * from small t1 natural right outer join big t2', there will be three
merged rows with 2 columns each
null 'A'
'b' 'B'
null null
And for natural right outer join, the generated code has column in the select
SQL map to the second column in the merged row. This will work fine in a
non-territory database, because if column 1 in the merged row has a non-null
value, then it will always be the same value as the column 2 in the merged row.
But in our example, with territor based database(with SECONDARY strength,
meaning it is case insensitive comparison), values 'B' and 'b' are considered
equal. Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd
column of the merged row. For natural right outer join with the data given in
the example above,
coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of
column x in the SELECT to the 2nd column in the merged row will return 'A', 'B'
and null thus returning data which does not comply with SQL standard which says
that column x's value should be the return value of coalesce(t1.x, t2.x). So it
seems like may be we need some of kind projection in case of natural right
outer join (rather than simple column mapping to the 2nd column which is what
happens right now) so that we look at both the columns in the merged row to
determine the value of column x.
Hope this explanation helps understand what Derby is doing internally and based
on that, we can come up with some proposal to fix the issue.
> Wrong join column returned by right outer join with NATURAL or USING and
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
> Key: DERBY-4631
> URL: https://issues.apache.org/jira/browse/DERBY-4631
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.6.1.0
> Reporter: Knut Anders Hatlen
> Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right
> outer join big t2;
> X |X |X |4
> -----------------------
> A |NULL |A |A
> B |b |B |b
> C |c |C |c
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira