[ 
https://issues.apache.org/jira/browse/DERBY-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644199#action_12644199
 ] 

A B commented on DERBY-3880:
----------------------------

I looked a bit more at this issue and I think I've discovered at least part of 
the problem.  A query like:

  select t1.i, avg(t2.i)
    from t1 inner join t2 on (t1.i = t2.i)
    group by t1.i having avg(t2.i) > 0;

has two aggregates in it, which means the compile time query tree has two 
AggregateNodes, each of which has an "operand" that is a column reference.  The 
result column to which the operand points is determined at bind time, and in 
this case the result column "t2.i" points to a bind-time JoinNode.  The 
JoinNode's bind-time result column list is simply a concatenation of the result 
column lists from its operands.  In our example, the query specifies T1 
followed by T2, so the JoinNode's bind-time RCL is: { T1.i, T1.c, T2.i, T2.c2, 
T2.i2 }.  Notice how, with respect to this list, the position of "t2.i" is "3". 
 So when we bind the AggregateNodes, each of them ends up with an operand that 
is a ColumnReference which points to the "t2.i" column of the bind-time 
JoinNode, which in turn has virtual column id of "3".

After binding is complete we start optimization, the first phase of which is 
preprocessing.  As part of SelectNode preprocessing we look at the FromList to 
see if there are any JoinNodes that can be flattened.  In the above query the 
JoinNode *can* be flattened, which means it effectively disappears.  That in 
turn means that any result columns in the SELECT list have to be updated to 
reflect the fact that they no longer point to the JoinNode; instead, they point 
directly to the underlying base tables.  The process of flattening the JoinNode 
and updating the SelectNode's result column list begins with the following call 
in SelectNode.preprocess():

    // Flatten any flattenable FromSubquerys or JoinNodes
    fromList.flattenFromTables(resultColumns, 
        wherePredicates, 
        whereSubquerys,
      groupByList);

Notice how we pass the SelectNode's RCL into the flattenFromTables method, but 
we do *not* pass the HAVING clause.  Long story short, the AggregateNode in the 
SelectNode's RCL is properly updated to reflect the removal of the 
JoinNode--but the AggregateNode in the HAVING clause continues to point to a 
result column from the now-useless JoinNode.  The *real* column to which the 
Aggregate should be pointing is actually buried *beneath* the JoinNode's 
ResultColumn, so the aggregate's operand should have been remapped to point to 
the buried column. (And indeed, that's exactly what happens with the aggregate 
in the SelectNode's RCL.)

When optimization is done and we modify the tree, result columns are updated to 
reflect the final join order and also to remove any un-referenced columns.  But 
since the HAVING aggregate node is still pointing to the unused JoinNode result 
column, it doesn't see any changes--it continues to point to a ResultColumn 
that has a virtual column id of "3" and which points to a JoinNode that is no 
longer valid.

The final query plan inclues a *new* JoinNode between T1 and T2, and that 
JoinNode's RCL only contains columns which are referenced in the query.  Thus 
the execution-time JoinNode result column list is, for the above query, { T1.i, 
T2.i }.  Note how it only has TWO columns in it, not five like the bind-time 
JoinNode.  So when it comes time to execute the query, the aggregate in the 
having clause ends up pointing to column "3" of the new JoinNode's RCL, and 
since that doesn't exist, the column value is set to null (as Kathey noted in 
earlier comments).  Attempts to reference that column for GROUP BY sorting, 
then, result in the NPE.

The reason the same query works if the two aggregates reference two _different_ 
columns in T2 has nothing to do with equivalence--that was a red herring, sorry 
for the miscue.  The reason the query passes appears to have something to do 
with referenced columns--esp. the SelectNode (which includes the Select RCL and 
the Having clause) then references TWO columns from T2 plus ONE column from T1, 
leading to THREE columns in the execution JoinNode's RCL: { T1.i, T2.i, T2.i2 
}.  So now the RCL has three columns instead of two, which means the result 
column in the HAVING aggregate, which still has the INCORRECT virtual id of 
"3", accidentally points to a valid column, and thus the query passes--but that 
was just chance, the code is still technically wrong.

This also explains why the original query passes if the user does NOT specify 
"inner join".  The inclusion of "inner join" leads to the creation of the 
bind-time JoinNode between T1 and T2, against which the aggregates are then 
bound.  In preprocessing we realize the JoinNode is useless so we flatten it 
out of the query, and that causes the aforementioned problem.  But if the query 
does not have "inner join", the bind-time JoinNode is never created, which 
means both aggregate nodes already point directly to the underlying base 
tables.  Throughout optimization and query generation the base table result 
columns are properly updated, which means both aggregates end up pointing to 
the correct place, and all is well.

So all of that said, I think the fix for this problem is to somehow ensure that 
the HAVING clause's AggregateNode is correctly remapped when/if necessary, esp. 
if it points to a JoinNode that has been flattened out.  I happened to notice 
that, when the JoinNode is flattened, all of its result columns are marked as 
"redundant" as part of the flattening process (see JoinNode.flatten()).  So the 
quick (and possibly NOT complete) fix that I found was to add an "if" statement 
to the getNewExpressionResultColumn() method of AggregateNode.java.  That 
method is called as part of "modification of access paths"--i.e. after 
optimization has completed--and is used for creating the result column that 
will ultimately be used for aggregate input at execution time.  I found that if 
I added logic to remap the Aggregate's operand (if it's a column reference), 
then the posted query starts running without error:

Index: java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java
===================================================================
--- java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java    
(revision 707244)
+++ java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java    
(working copy)
@@ -539,6 +539,26 @@
                        this.getNewNullResultExpression() :
                        operand;

 +        /* The operand for this aggregate node was initialized at bind
 +         * time.  Between then and now it's possible that certain changes
 +         * have been made to the query tree which affect this operand. In
 +         * particular, if the operand was pointing to a result column in
 +         * a JoinNode and then that JoinNode was flattened during pre-
 +         * processing, all of the references to that JoinNode--including
 +         * this aggregate's operand--need to be updated to reflect the
 +         * fact that the Join Node no longer exists.  So check to see if
 +         * the operand is a column reference, and if so, make a call to
 +         * remap it to its underlying expression.  If nothing has happened
 +         * then this will be a no-op; but if something has changed to void
 +         * out the result column to which the operand points, the result
 +         * column will be marked "redundant" and the following call should
 +         * remap as appropriate. DERBY-3880.
 +         */
 +        if (operand instanceof ColumnReference)
 +        {
 +            ((ColumnReference)operand).remapColumnReferencesToExpressions();
 +        }
 +
            return (ResultColumn) getNodeFactory().getNode(
                                                                
C_NodeTypes.RESULT_COLUMN,
                                                                "##aggregate 
expression",

I didn't run any other tests, and I admit that this change may not be 
sufficient (or it may be too general), but it does cause the reported query to 
run without error, so at the very least it demonstrates the problem.  There 
could very well be a better way to fix it...

> NPE on a query with having clause involving a join
> --------------------------------------------------
>
>                 Key: DERBY-3880
>                 URL: https://issues.apache.org/jira/browse/DERBY-3880
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>         Environment: Windows 2003 Server 
>            Reporter: Venkateswaran Iyer
>            Priority: Minor
>         Attachments: AggregateExpressionResultColumn.txt, querytree_fail.txt, 
> querytree_works.txt
>
>
> A simple query involving a join and having clause causes a NPE. Any 
> subsequent executions cause severe errors. It almost looks like the 
> underlying connection was closed out.
> ====
> C:\apps\derby\db-derby-10.4.2.0-bin\db-derby-10.4.2.0-bin\bin>ij
> ij version 10.4
> ij> connect 'jdbc:derby://speed:1527/ClassicModels;user=sa;password=sa
> ';
> ij> create table t1(i int, c varchar(20));
> 0 rows inserted/updated/deleted
> ij> create table t2(i int, c2 varchar(20), i2 int);
> 0 rows inserted/updated/deleted
> ij> insert into t1 values(1, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t1 values(2, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'xyz', 10);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'aaa', 20);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(2, 'xxx', 30);
> 1 row inserted/updated/deleted
> ij> select t1.i, avg(t2.i2) from t1 inner join t2 on (t1.i = t2.i) group by 
> t1.i
>  having avg(t2.i2) > 0;
> ERROR XJ001: DERBY SQL error: SQLCODE: -1, SQLSTATE: XJ001, SQLERRMC: 
> java.lang.
> NullPointerException¶¶XJ001.U

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to