[jira] Commented: (DERBY-3257) SELECT with HAVING clause containing OR conditional incorrectly return 1 row - should return 2 rows - works correctly with 10.2 DB

A B (JIRA) Fri, 18 Jan 2008 10:40:01 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560494#action_12560494
 ]


A B commented on DERBY-3257:
----------------------------

> I think I need a better understanding of why it is necessary for the 
> havingClause
> to be in CNF before the call to preprocess

This is a good question--and perhaps in the long run this is not requirement 
per se.  The only reason I thought to normalize the havingClause is because the 
OrNode.preprocess() method expects that the OrNode has been normalized.  I.e. 
it expects that the predicate:

  [not normalized] =>  (t0.CODE = 'GBR' OR t0.CODE = 'CHA')

becomes

  [normalized] => (t0.CODE = 'GBR' OR ((t0.CODE = 'CHA') OR FALSE))

When it transforms the OR list into an IN list, then, it just walks the tree 
and grabs the left operand from all of the chained ORs.  So in the normalized 
case we grab:

  1. t0.CODE = 'GBR'
  2. t0.CODE = 'CHA'

>From those two predicates we then get "t0.CODE in ('GBR', 'CHA')", which is 
>correct.  If the HAVING clause is not normalized, though, then the logic in 
>OrNode will only grab:

  1. t0.CODE = 'GBR'

because there is only one OrNode and that's its left operand.   So we end up 
with "t0.CODE in ('GBR')" and thus we miss the row for 'CHA'.

All of that said, maybe normalizing the HAVING clause is not the best solution, 
or perhaps it goes too far?  Maybe OrNode should be changed to recognize if it 
is NON-normalized and should not try to transform itself in that case?  If 
that's true, the follow-up question would be: Are there other places in the 
code that expect the having clause to be normalized, as well?

It seems to me that something in DERBY-681 has exposed the HAVING clause to 
processing that was previously only applied to WHERE clauses.  If that's true, 
and if the change was intentional, then the HAVING clause should (in theory) 
have to go through the same pre-processing steps as the WHERE clause--which 
includes normalization.  If that change in behavior was not intentional, 
though, then maybe the fix for this issue is to somehow prevent WHERE-clause 
-specific processing from happening on the HAVING clause.  I don't know for 
sure, but that last option sounds pretty hard...

> is it a problem to avoid [normalization] for SubqueryNodes?

I would follow this up with two other questions:

  1. What it is about normalization that causes problems for the SubqueryNode?  
It looks like the 42X24 error comes because the result column referenced within 
the subquery has a "source level" that is the same as the outer query--and that 
is (apparently?) because the SuqbueryNode somehow "disappears" as a result of 
the normalization.  That suggests that perhaps the Subquery is being flattened 
during, or as a result of, normalization.  Without normalization, the 
SubqueryNode hangs around and its result column gets a source level that is 
different from the outer query, so the check in 
GroupByNode.addNewColumnsForAggregation() passes.

  2. What happens if the same nested query appears in a WHERE clause and the 
WHERE clause is normalized?  Will the subquery be flattened during or after 
normalization?

These two questions don't answer yours, but perhaps some investigation of them 
will lead to an answer.  If the answer is "Yes, it's okay to avoid 
normalization for SubqueryNodes", then that leads to yet another question: are 
there other nodes for which it is okay to avoid normalization, as well?

> SELECT with HAVING clause containing OR conditional incorrectly return 1 row 
> - should return 2 rows - works correctly with 10.2 DB
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3257
>                 URL: https://issues.apache.org/jira/browse/DERBY-3257
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.3.2.1, 10.4.0.0
>            Reporter: Stan Bradbury
>            Assignee: Kathey Marsden
>         Attachments: 42X24_error.sql, d3257_doNOTCommit.patch, 
> derby-3257_plan_10.2.txt, derby-3257_plan_10.4.txt, TestHaving.java
>
>
> Attached program demonstrates the problem.  Only one count is returned 
> (matching CODE= GBR) - the count of CODE=CHA is not returned.  Works fine 
> with versions 10.1 and 10.2 or if program is run using 10.3 jars and 10.2 
> database (soft upgrade).
> Query:
> SELECT COUNT(t0.ID) FROM CTS1.TEST_TABLE t0 
>   GROUP BY t0.CODE 
>     HAVING (t0.CODE = 'GBR' OR t0.CODE = 'CHA') AND t0.CODE IS NOT NULL
> Incorrect results (see last line):
> Database product: Apache Derby
> Database version: 10.3.1.5 - (579866)
> Driver name:      Apache Derby Embedded JDBC Driver
> Driver version:   10.3.1.5 - (579866)
> result: 2
> Correct results:
> Database product: Apache Derby
> Database version: 10.2.2.0 - (485682)
> Driver name:      Apache Derby Embedded JDBC Driver
> Driver version:   10.2.2.0 - (485682)
> result: 4
> result: 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3257) SELECT with HAVING clause containing OR conditional incorrectly return 1 row - should return 2 rows - works correctly with 10.2 DB

Reply via email to