[
https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187090#comment-13187090
]
Mamta A. Satoor commented on DERBY-4631:
----------------------------------------
Here are answers to some of Mike's comments
********************************************************
Mike's comment
I am concerned by the performance characteristics of you proposed solution #2.
Could you go into more detail on what it involves and what set of queries it
will affect
********************************************************
As part of solution #2, I am proposing that during the bind phase, we are going
through the join query's column list, we replace the join column in the list
with coalesce function and make sure that newly added coalesce node to the
select list gets bound. This node will generate a call to following method in
DataTypejava
public DataValueDescriptor coalesce(DataValueDescriptor[]
argumentsList, DataValueDescriptor returnValue)
throws StandardException
{
// arguments list should have at least 2 arguments
if (SanityManager.DEBUG)
{
SanityManager.ASSERT(argumentsList != null,
"argumentsList expected to be non-null");
SanityManager.ASSERT(argumentsList.length > 1,
"argumentsList.length expected to be > 1");
}
/* Walk the arguments list until we find a non-null value.
Otherwise we will return null
*/
int index;
for (index = 0; index < argumentsList.length; index++)
{
if (!(argumentsList[index].isNull()))
{
returnValue.setValue(argumentsList[index]);
return returnValue;
}
}
returnValue.setToNull();
return returnValue;
}
In our case, there will always be only 2 arguments to the coalesce function and
the first non-null value will have us return with that value from the method
and if for some reason, if both the values are null, then we will return null
value. All this work will be done for joins using NATURAL JOINS or USING clause
which can happen for inner joins, left outer join and right outer joins. eg of
each of these kind of joins
CREATE TABLE derby4631_t1(x varchar(5));
INSERT INTO derby4631_t1 VALUES 'A','B';
CREATE TABLE derby4631_t2(x varchar(5));
INSERT INTO derby4631_t2 VALUES 'b','c';
SELECT x FROM derby4631_t2 NATURAL INNER JOIN derby4631_t1;
SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;
SELECT x FROM derby4631_t2 NATURAL LEFT OUTER JOIN derby4631_t1;
select x from derby4631_t2 INNER JOIN derby4631_t1 USING(x);
select x from derby4631_t2 RIGHT OUTERJOIN derby4631_t1 USING(x);
select x from derby4631_t2 LEFT OUTERJOIN derby4631_t1 USING(x);
Currently, at execution time, Derby already does special handling to figure out
if it is dealing with left outer join or right outer join and based on that, it
determines whether it should pick up the value from the left column or the
right column in the merged row(consisting of 2 columns, join column from left
and join column from right) for each of the join columns. Instead of this, now
we will let coalesce pick up
1)the left column's value if it is not null. If it is null then
2)pick up the right column's value if it is not null. If it is null then simply
return null.
So, there is special code happening with both the existing Derby behavior and
proposed Derby behavior but at this point, I am not sure how much
more/same/less performance impact the new changes might cause. This is all in
theory at this point. I have never worked on the code to replace a user
supplied column from the select list with another kind of node at bind time, so
if we do decide to go this path, I will be greatful to have community's
knowledge on this kind of change. I will work on it my own too to figure out
what needs to be done if we decide to go with this solution.
********************************************************
Mike's comment
I am assuming that the reason the code does not currently do an explicit
coalesce already is an optimization, where it was assumed the implementation
behavior would match the external behavior that the spec is describing,
********************************************************
You are probably right Mike, but I am not sure if coalesce functionality was
available in Derby when joins were implemented. It could very well be that we
had coalesce available but we didn't use it for optimization reasons but I just
wanted to raise that it might be a possibility that we never have had coalesce
available.
> Wrong join column returned by right outer join with NATURAL or USING and
> territory-based collation
> --------------------------------------------------------------------------------------------------
>
> Key: DERBY-4631
> URL: https://issues.apache.org/jira/browse/DERBY-4631
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.6.1.0
> Reporter: Knut Anders Hatlen
> Assignee: Mamta A. Satoor
> Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect
> 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right
> outer join big t2;
> X |X |X |4
> -----------------------
> A |NULL |A |A
> B |b |B |b
> C |c |C |c
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira