[jira] [Updated] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Mamta A. Satoor (Updated) (JIRA) Mon, 30 Jan 2012 20:40:58 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mamta A. Satoor updated DERBY-4631:
-----------------------------------

    Attachment: DERBY_4631_not_for_commit_patch1_stat.txt
                DERBY_4631_not_for_commit_patch1_diff.txt

I have a patch which is not ready for commit. It is a work in progress for the 
solution 1) proposed earlier in the jira which is as follows
Solution 1) requires that we change the code generation to following logic. 
1)if the left table's join column value is null then pick up the right table's 
join column's value. 
2)If the left table's join column value is non-null, then pick up that value 
We should have the solution 1) kickin for only RIGHT OUTER JOIN with USING 
clause or NATURAL JOIN. 
LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN will work 
correctly with current Derby logic which is to always pickup the left table's 
join column value. 
This will work for LEFT OUTER JOINs and INNER JOINs with USING clause or 
NATURAL JOIN in both territory and non-territory based databases


The attached patch now makes the following query return the correct results
java -Dij.exceptionTrace=true org.apache.derby.tools.ij 
connect 
'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
 
CREATE TABLE derby4631_t1(x varchar(5)); 
INSERT INTO derby4631_t1 VALUES 'A','B'; 
CREATE TABLE derby4631_t2(x varchar(5)); 
INSERT INTO derby4631_t2 VALUES 'b','c'; 
SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; 
ij> X
-----
A
b

The patch in it's current state is quite a bit of hacking because currently a 
ResultColumn does not need to know if it belongs to a Join ResultSet. But with 
solution 1), if ResultColumn is a join column part of a RIGHT OUTER JOIN with 
USING/NATURAL JOIN, then we want the join column's value to be determined using 
the solution 1) described above. For that, a ResultColumn will now need to know 
if it is part of a RIGHT OUTER JOIN with USING/NATURAL JOIN and it will need to 
know the column positions of the left table's
join column and rigt table's join colunm and it will need to know the resultset 
number of the resultset from which these join colunms's values will be 
extracted. 

I get these values in this patch by putting extra information related to joins 
in ResultColunm at bind time and using them at code generation time.

This patch may break many other queries and I will continue to work on the 
patch to make it more stable but I wanted to put the general approach for this 
solution out sooner than later to see if community has any feedback on keeping 
the extra information on ResultColumn needed to implement solution 1). 
Although, all this additional information maintenance is pretty localized and 
not many files are impacted by this solution. 

Next I plan to work on the patch more to make it stable and do more testing 
with it to see how it will work for the rest of the queries. 

Alternative to this approach would be to introduce a new kind of compilation 
node which will be created during the bind phase(unlike most nodes which get 
created during parsing) when we find that the ResultColumn belongs to Join 
Node(which means we will still have to do the checking I do in this patch to 
see if ResultColumn is part of RIGHT OUTER JOIN with USING/NATURAL JOIN but at 
code generation time, 
we can have the new node do this special code generation which is how we handle 
all the other special nodes like Conditional If node, Coalesce node etc). This 
approach of adding new node will require us to somehow fire the binding of the 
new node after replacing the ResultColumn which was created during the Parse 
time. At this point, I am unfamiliar with how to replace a node during the bind 
time with some other node and make it go through the binding step. Also, 
replacing the ResultColumn with a new node might also impact queries like 
following where I think order by column from right table is associated with the 
join column in the select list
SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 ORDER BY 
derby4631_t1.x; 


The files changed by this patch are as follows
$ svn stat -q
M       java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
M       java\engine\org\apache\derby\impl\sql\compile\ProjectRestrictNode.java
M       java\engine\org\apache\derby\impl\sql\compile\JoinNode.java
M       java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java
M       java\engine\org\apache\derby\impl\sql\compile\HashTableNode.java

Following explains the changes in brief for the files touched by this patch
ResultColumn.java has additioanl columns to keep RIGHT OUTER JOIN with 
USING/NATURAL JOIN information. These
columns will get updated during bind time if ResultColumn is part of such a 
join. In my next patch, I think I should be able to remove 
virtualColumnIdLeftTable and virtualColumnIdRightTable and just get that 
information
from joinResultSet. 

The signature of ResultColumnList.mapSourceColumns has changed and hence the 
changes in 
ProjectRestrictNode.java and HashTableNode.java.

JoinNode.java - checks if it is RIGHT OUTER JOIN with USING/NATURAL JOIN and if 
yes, then it marks the right 
table's join column's ResultColumn to recognize that.

ResultColumnList.java in it's code generation logic checks if the ResultColumn 
is a join column for RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then 
it generates the following code for that column
1)if the left table's join column value is null then pick up the right table's 
join column's value. 
2)If the left table's join column value is non-null, then pick up that value 

Will appreciate any feedback on this patch. I plan to work more on it to clean 
it up by looking at removing some of the redundant new informaiton in 
ResultColumn and also to check if there is a better place for code generation 
for a join column part of RIGHT OUTER JOIN with USING/NATURAL JOIN. I also 
anticipate existing queries failing with this current patch. I will work on 
identifying such queries. But I wanted to get feedback on general approach of 
this patch.

                
> Wrong join column returned by right outer join with NATURAL or USING and 
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Mamta A. Satoor
>              Labels: derby_triage10_8
>         Attachments: DERBY_4631_not_for_commit_patch1_diff.txt, 
> DERBY_4631_not_for_commit_patch1_stat.txt
>
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right 
> outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Reply via email to