[
https://issues.apache.org/jira/browse/DERBY-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472439
]
Manish Khettry commented on DERBY-681:
--------------------------------------
Thanks for reviewing the patch. It will take me sometime to make the patch
current and look at your comments. It has, after all, been a while since I
submitted the patch.
I am curious-- is it typical for a patch to gather dust for a few months before
someone finds the time to look at it? And if so, is this a good thing?
Thanks,
Manish
"A B (JIRA)" <[EMAIL PROTECTED]> wrote:
[
https://issues.apache.org/jira/browse/DERBY-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471852
]
A B commented on DERBY-681:
---------------------------
I noticed that this issue is now on the bottom of the "patch available"
list--and that the patch was posted almost two months ago. So I did a quick
review and my comments are below. Note: The patch is out of date (not
surprisingly); it would be great if a newer version could be re-attached that
incorporates the following comments (I am willing to continue the
review/discussion until the patch is committed).
I think the difference between a fetch size of 1 and a fetch size of 16 comes
down to the difference between a TableScan and BulkTableScan. I did a quick
search of the codeline and it looks like Derby will disable bulk fetching if
the result set is "ordering dependent" (see SelectNode.genProjectRestrict())
and also for certain min/max optimizations in a group by (see the
"considerPostOptimizeOptimization()" method of GroupByNode). My guess is that
your changes have somehow made it so that we no longer need to disable bulk
fetch for certain queries, and thus you are now seeing a different fetch size.
This seems particular likely since all of the queries that show a 1 vs 16
difference in aggregateOptimization() come under the heading of "group by
ordered on grouping columns". So long story short, my feeling is that this is
an acceptable diff...
Hopefully someone will correct me if I'm wrong...
Also (w.r.t to "notes.txt" as attached to this issue):
-- The notes you wrote for "Background on Group By" would be great as
javadoc comments in the GroupByNode.addNewColumnsForAggregation() method
(in addition to what's already there).
-- The notes you wrote for "Having clause -> Design" would be great as
comments in the GroupByNode.addAggregateColumns() method (perhaps just
before the "if (havingClause != null)"...)
Other review comments (note: I haven't done much actual testing yet, I've just
looked at the code changes; I hope to do more testing of the changes next
week...):
I think it would good if the following issues could be addressed before commit:
1) FromBaseTable.java:
-- It looks like you added an "accept()" method to FromBaseTable that
overrides ResultSetNode.accept(). I noticed that ResultSetNode.accept()
recursively calls "accept" on "resultColumns", but the new method in
FromBaseTable does not. This means that in cases where we used to accept
visitors for base table result columns we will no longer do so. I don't
know what the effects of that might be, but I think that's probably not
good. It would perhaps be better to call "super.accept()" at the start
of FromBaseTable.accept() and then go from there.
2) GroupByNode.java:
-- init() method has some lines that are commented out. You mention
these in your "notes.txt" file but there is no explanation as to why
they are commented out (and not just deleted) in the file itself;
might be good to add such comments (you could just take them from
notes.txt).
3) ResultColumnList.java:
-- With the following diff:
- int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+// int size = (size() > sourceRCL.size()) ? size() : sourceRCL.size();
+ int size = Math.min(size(), sourceRCL.size());
The original line (which is now commented out) looks like it assigns size
to be the *max* of size() and sourceRCL.size(); but your changes make it
use the minimum. I looked at the code and I think your fix is correct--
i.e. that we should be using the minimum size. So should the other line
just be deleted (instead of commented out)? Also, there appears to be an
implicit assumption that if the lists are two different sizes then the
shorter one must correspond to the _leading_ columns of the longer one.
If you're so inclined it might be nice to add a comment saying as much
(to go along with your change here).
4) ColumnReference.java:
-- Unrelated (and perhaps accidental) code cleanup diff; better to leave
this out of the patch.
5) GroupByExpressionTest.java:
-- In the "testSubSelect()" method of you added a test case that is
identical to the one immediately preceding it (so far as I can tell).
Was that intentional, or is there a HAVING clause missing?
The rest are nit-pick issues that I hope you might consider, though they should
not block commit of the patch:
6) SelectNode.java:
-- Some unrelated code cleanup; not a big deal, but as a general rule
it detracts from the patch (makes it hard to figure out what changes
are related to the issue and what changes are not). Ex. See the
bindNonVTITables() method.
-- I wonder if we really need a new "init()" method here? As far as I
can tell there are only two files that currently create a SELECT node:
DeleteNode.java and sqlgrammar.jj. The latter uses the new init()
method, the former uses the old one--but the only difference is the
presence of the "havingClause" argument. DeleteNode already passes
in several null values (with associated comments), so would it make
sense to just add a null for "havingClause", as well? Then we would
only need the one "init()" method for SelectNode. I admit, though,
that this a stylistic detail and the code as you have it is correct.
So if you prefer to leave it as it is, that's fine--although it might
might be good to add some minimal javadoc to the new init() method
(ex. "@param havingClause The HAVING clause, if any", to keep in line
with the old init() method).
7) GroupByNode.java:
-- Unnecessary import of JBitSet and Properties.
-- Javadoc for "init()" method doesn't mention "nestingLevel"
-- Looks like the following comment was deleted:
- // finally reset gbc to its new position.
Was that intentional?
8) GroupByList.java:
-- Might be nice if you could add some comments for the following line:
+ selectRCL.setCountMismatchAllowed(true);
i.e. maybe explain briefly why count mismatches are allowed for GroupBy?
9) ResultColumnList.java:
-- With your changes there is now the following "if" statement in
"propagateDCLInfo()":
/* Do both lists, if supplied by user, have the same degree? */
if (derivedRCL.size() != size() &&
! derivedRCL.getCountMismatchAllowed())
{
if (visibleSize() != derivedRCL.size()) {
throw
StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH,
tableName);
}
}
Could this be rewritten to:
/* Do both lists, if supplied by user, have the same degree? */
if (derivedRCL.size() != visibleSize() &&
! derivedRCL.getCountMismatchAllowed())
{
throw
StandardException.newException(SQLState.LANG_DERIVED_COLUMN_LIST_MISMATCH,
tableName);
}
As I said I haven't looked much at the new test cases nor have I done much
testing myself. I will hopefully have time to do more of that next week...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.
> Eliminate the parser's rewriting of the abstract syntax tree for queries with
> GROUP BY and/or HAVING clauses
> ------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-681
> URL: https://issues.apache.org/jira/browse/DERBY-681
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Reporter: Rick Hillegas
> Assigned To: Manish Khettry
> Attachments: 681.patch.txt, notes.txt
>
>
> If a query contains a GROUP BY or HAVING clause, the parser rewrites the
> abstract syntax tree, putting aggregates into a subselect and treating the
> HAVING clause as the WHERE clause of a fabricated outer select from the
> subquery. This allows the compiler to re-use some machinery since the HAVING
> clause operates on the grouped result the way that the WHERE clause operates
> on the from list. Unfortunately, this rewriting creates an explosion of
> special cases in the compiler after parsing is done. The rewriting is not
> systematically handled later on in the compiler. This gives rise to defects
> like bug 280. We need to eliminate this special rewriting and handle the
> HAVING clause in a straightforward way. This is not a small bugfix but is a
> medium sized project.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.