[jira] Commented: (DERBY-681) Eliminate the parser's rewriting of the abstract syntax tree for queries with GROUP BY and/or HAVING clauses

Manish Khettry (JIRA) Mon, 12 Feb 2007 14:15:27 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472439
 ]


Manish Khettry commented on DERBY-681:
--------------------------------------

Thanks for reviewing the patch. It will take me sometime to make the patch 
current and look at your comments. It has, after all, been a while since I 
submitted the patch.

I am curious-- is it typical for a patch to gather dust for a few months before 
someone finds the time to look at it? And if so, is this a good thing?

Thanks,
Manish

"A B (JIRA)" <[EMAIL PROTECTED]> wrote: 
    [ 
https://issues.apache.org/jira/browse/DERBY-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471852
 ] 

A B commented on DERBY-681:
---------------------------

I noticed that this issue is now on the bottom of the "patch available" 
list--and that the patch was posted almost two months ago.  So I did a quick 
review and my comments are below.  Note: The patch is out of date (not 
surprisingly); it would be great if a newer version could be re-attached that 
incorporates the following comments (I am willing to continue the 
review/discussion until the patch is committed).



I think the difference between a fetch size of 1 and a fetch size of 16 comes 
down to the difference between a TableScan and BulkTableScan.  I did a quick 
search of the codeline and it looks like Derby will disable bulk fetching if 
the result set is "ordering dependent" (see SelectNode.genProjectRestrict()) 
and also for certain min/max optimizations in a group by (see the 
"considerPostOptimizeOptimization()" method of GroupByNode).  My guess is that 
your changes have somehow made it so that we no longer need to disable bulk 
fetch for certain queries, and thus you are now seeing a different fetch size.  
This seems particular likely since all of the queries that show a 1 vs 16 
difference in aggregateOptimization() come under the heading of "group by 
ordered on grouping columns".  So long story short, my feeling is that this is 
an acceptable diff...

Hopefully someone will correct me if I'm wrong...

Also (w.r.t to "notes.txt" as attached to this issue):

  -- The notes you wrote for "Background on Group By" would be great as
  javadoc comments in the GroupByNode.addNewColumnsForAggregation() method
  (in addition to what's already there).

  -- The notes you wrote for "Having clause -> Design" would be great as
  comments in the GroupByNode.addAggregateColumns() method (perhaps just
  before the "if (havingClause != null)"...)

Other review comments (note: I haven't done much actual testing yet, I've just 
looked at the code changes; I hope to do more testing of the changes next 
week...):

I think it would good if the following issues could be addressed before commit:

  1) FromBaseTable.java:

  -- It looks like you added an "accept()" method to FromBaseTable that
  overrides ResultSetNode.accept().  I noticed that ResultSetNode.accept()
  recursively calls "accept" on "resultColumns", but the new method in
  FromBaseTable does not.  This means that in cases where we used to accept
  visitors for base table result columns we will no longer do so.  I don't
  know what the effects of that might be, but I think that's probably not
  good.  It would perhaps be better to call "super.accept()" at the start
  of FromBaseTable.accept() and then go from there.

  2) GroupByNode.java:

  -- init() method has some lines that are commented out.  You mention
  these in your "notes.txt" file but there is no explanation as to why
  they are commented out (and not just deleted) in the file itself;
  might be good to add such comments (you could just take them from
  notes.txt).

  3) ResultColumnList.java:

  -- With the following diff:

-        int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+//      int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+        int size = Math.min(size(), sourceRCL.size());

  The original line (which is now commented out) looks like it assigns size
  to be the *max* of size() and sourceRCL.size(); but your changes make it
  use the minimum.  I looked at the code and I think your fix is correct--
  i.e. that we should be using the minimum size.  So should the other line
  just be deleted (instead of commented out)?  Also, there appears to be an
  implicit assumption that if the lists are two different sizes then the
  shorter one must correspond to the _leading_ columns of the longer one.
  If you're so inclined it might be nice to add a comment saying as much
  (to go along with your change here).

  4) ColumnReference.java:

  -- Unrelated (and perhaps accidental) code cleanup diff; better to leave
  this out of the patch.

  5) GroupByExpressionTest.java:

  -- In the "testSubSelect()" method of you added a test case that is
  identical to the one immediately preceding it (so far as I can tell).
  Was that intentional, or is there a HAVING clause missing?

The rest are nit-pick issues that I hope you might consider, though they should 
not block commit of the patch:

  6) SelectNode.java:

  -- Some unrelated code cleanup; not a big deal, but as a general rule
  it detracts from the patch (makes it hard to figure out what changes
  are related to the issue and what changes are not).  Ex. See the
  bindNonVTITables() method.

  -- I wonder if we really need a new "init()" method here?  As far as I
  can tell there are only two files that currently create a SELECT node:
  DeleteNode.java and sqlgrammar.jj.  The latter uses the new init()
  method, the former uses the old one--but the only difference is the
  presence of the "havingClause" argument.  DeleteNode already passes
  in several null values (with associated comments), so would it make
  sense to just add a null for "havingClause", as well?  Then we would
  only need the one "init()" method for SelectNode.  I admit, though,
  that this a stylistic detail and the code as you have it is correct.
  So if you prefer to leave it as it is, that's fine--although it might
  might be good to add some minimal javadoc to the new init() method
  (ex. "@param havingClause The HAVING clause, if any", to keep in line
  with the old init() method).

  7) GroupByNode.java:

  -- Unnecessary import of JBitSet and Properties.

  -- Javadoc for "init()" method doesn't mention "nestingLevel"

  -- Looks like the following comment was deleted:

-   // finally reset gbc to its new position.

    Was that intentional?

  8) GroupByList.java:

  -- Might be nice if you could add some comments for the following line:

+    selectRCL.setCountMismatchAllowed(true);

    i.e. maybe explain briefly why count mismatches are allowed for GroupBy?

  9) ResultColumnList.java:

  -- With your changes there is now the following "if" statement in 
"propagateDCLInfo()":

    /* Do both lists, if supplied by user, have the same degree? */
    if (derivedRCL.size() != size() &&
        ! derivedRCL.getCountMismatchAllowed())
    {
        if (visibleSize() != derivedRCL.size()) {
            throw 
StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH, 
tableName);
        }
    }

  Could this be rewritten to:

    /* Do both lists, if supplied by user, have the same degree? */
    if (derivedRCL.size() != visibleSize() &&
        ! derivedRCL.getCountMismatchAllowed())
    {
        throw 
StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH, 
tableName);
    }

As I said I haven't looked much at the new test cases nor have I done much 
testing myself.  I will hopefully have time to do more of that next week...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


> Eliminate the parser's rewriting of the abstract syntax tree for queries with 
> GROUP BY and/or HAVING clauses
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-681
>                 URL: https://issues.apache.org/jira/browse/DERBY-681
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rick Hillegas
>         Assigned To: Manish Khettry
>         Attachments: 681.patch.txt, notes.txt
>
>
> If a query contains a GROUP BY or HAVING clause, the parser rewrites the 
> abstract syntax tree, putting aggregates into a subselect and treating the 
> HAVING clause as the WHERE clause of a fabricated outer select from the 
> subquery. This allows the compiler to re-use some machinery since the HAVING 
> clause operates on the grouped result the way that the WHERE clause operates 
> on the from list. Unfortunately, this rewriting creates an explosion of 
> special cases in the compiler after parsing is done. The rewriting is not 
> systematically handled later on in the compiler. This gives rise to defects 
> like bug 280. We need to eliminate this special rewriting and handle the 
> HAVING clause in a straightforward way. This is not a small bugfix but is a 
> medium sized project.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-681) Eliminate the parser's rewriting of the abstract syntax tree for queries with GROUP BY and/or HAVING clauses

Reply via email to