[
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mamta A. Satoor updated DERBY-4631:
-----------------------------------
Attachment: DERBY_4631_not_for_commit_patch1_stat.txt
DERBY_4631_not_for_commit_patch1_diff.txt
I have a patch which is not ready for commit. It is a work in progress for the
solution 1) proposed earlier in the jira which is as follows
Solution 1) requires that we change the code generation to following logic.
1)if the left table's join column value is null then pick up the right table's
join column's value.
2)If the left table's join column value is non-null, then pick up that value
We should have the solution 1) kickin for only RIGHT OUTER JOIN with USING
clause or NATURAL JOIN.
LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN will work
correctly with current Derby logic which is to always pickup the left table's
join column value.
This will work for LEFT OUTER JOINs and INNER JOINs with USING clause or
NATURAL JOIN in both territory and non-territory based databases
The attached patch now makes the following query return the correct results
java -Dij.exceptionTrace=true org.apache.derby.tools.ij
connect
'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
CREATE TABLE derby4631_t1(x varchar(5));
INSERT INTO derby4631_t1 VALUES 'A','B';
CREATE TABLE derby4631_t2(x varchar(5));
INSERT INTO derby4631_t2 VALUES 'b','c';
SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;
ij> X
-----
A
b
The patch in it's current state is quite a bit of hacking because currently a
ResultColumn does not need to know if it belongs to a Join ResultSet. But with
solution 1), if ResultColumn is a join column part of a RIGHT OUTER JOIN with
USING/NATURAL JOIN, then we want the join column's value to be determined using
the solution 1) described above. For that, a ResultColumn will now need to know
if it is part of a RIGHT OUTER JOIN with USING/NATURAL JOIN and it will need to
know the column positions of the left table's
join column and rigt table's join colunm and it will need to know the resultset
number of the resultset from which these join colunms's values will be
extracted.
I get these values in this patch by putting extra information related to joins
in ResultColunm at bind time and using them at code generation time.
This patch may break many other queries and I will continue to work on the
patch to make it more stable but I wanted to put the general approach for this
solution out sooner than later to see if community has any feedback on keeping
the extra information on ResultColumn needed to implement solution 1).
Although, all this additional information maintenance is pretty localized and
not many files are impacted by this solution.
Next I plan to work on the patch more to make it stable and do more testing
with it to see how it will work for the rest of the queries.
Alternative to this approach would be to introduce a new kind of compilation
node which will be created during the bind phase(unlike most nodes which get
created during parsing) when we find that the ResultColumn belongs to Join
Node(which means we will still have to do the checking I do in this patch to
see if ResultColumn is part of RIGHT OUTER JOIN with USING/NATURAL JOIN but at
code generation time,
we can have the new node do this special code generation which is how we handle
all the other special nodes like Conditional If node, Coalesce node etc). This
approach of adding new node will require us to somehow fire the binding of the
new node after replacing the ResultColumn which was created during the Parse
time. At this point, I am unfamiliar with how to replace a node during the bind
time with some other node and make it go through the binding step. Also,
replacing the ResultColumn with a new node might also impact queries like
following where I think order by column from right table is associated with the
join column in the select list
SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 ORDER BY
derby4631_t1.x;
The files changed by this patch are as follows
$ svn stat -q
M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
M java\engine\org\apache\derby\impl\sql\compile\ProjectRestrictNode.java
M java\engine\org\apache\derby\impl\sql\compile\JoinNode.java
M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java
M java\engine\org\apache\derby\impl\sql\compile\HashTableNode.java
Following explains the changes in brief for the files touched by this patch
ResultColumn.java has additioanl columns to keep RIGHT OUTER JOIN with
USING/NATURAL JOIN information. These
columns will get updated during bind time if ResultColumn is part of such a
join. In my next patch, I think I should be able to remove
virtualColumnIdLeftTable and virtualColumnIdRightTable and just get that
information
from joinResultSet.
The signature of ResultColumnList.mapSourceColumns has changed and hence the
changes in
ProjectRestrictNode.java and HashTableNode.java.
JoinNode.java - checks if it is RIGHT OUTER JOIN with USING/NATURAL JOIN and if
yes, then it marks the right
table's join column's ResultColumn to recognize that.
ResultColumnList.java in it's code generation logic checks if the ResultColumn
is a join column for RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then
it generates the following code for that column
1)if the left table's join column value is null then pick up the right table's
join column's value.
2)If the left table's join column value is non-null, then pick up that value
Will appreciate any feedback on this patch. I plan to work more on it to clean
it up by looking at removing some of the redundant new informaiton in
ResultColumn and also to check if there is a better place for code generation
for a join column part of RIGHT OUTER JOIN with USING/NATURAL JOIN. I also
anticipate existing queries failing with this current patch. I will work on
identifying such queries. But I wanted to get feedback on general approach of
this patch.
> Wrong join column returned by right outer join with NATURAL or USING and
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
> Key: DERBY-4631
> URL: https://issues.apache.org/jira/browse/DERBY-4631
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.6.1.0
> Reporter: Knut Anders Hatlen
> Assignee: Mamta A. Satoor
> Labels: derby_triage10_8
> Attachments: DERBY_4631_not_for_commit_patch1_diff.txt,
> DERBY_4631_not_for_commit_patch1_stat.txt
>
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right
> outer join big t2;
> X |X |X |4
> -----------------------
> A |NULL |A |A
> B |b |B |b
> C |c |C |c
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira