[ 
https://issues.apache.org/jira/browse/DERBY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724810#action_12724810
 ] 

Bryan Pendleton commented on DERBY-3002:
----------------------------------------

Hi Knut, thanks again for having a look at the patch. Your help is much 
appreciated!

I think that writing a simple GROUP BY benchmark would be an excellent next 
step. Is there such a benchmark readily available? If not, I'll put one 
together.

The changes to the run-time statistics are precisely because the algorithm has 
changed: instead of the sort-observer technique, we now always compute 
aggregates in-line, so the sort step no longer "collapses" groups and the same 
number of rows are output from the sort as are input, whereas before the sorter 
would perform the grouping as a side effect, and the number of output rows was 
equal to the number of groups. So yes the behavior change in the statistics is 
expected.

The in-memory hash tables are only used for DISTINCT aggregates, and will hold 
one copy of every unique value of that particular column in that particular 
group. They could indeed run out of memory if the distribution of data was just 
right. I don't have a good intuition for (a) how often DISTINCT aggregates are 
used, (b) how many distinct values there tend to be per group, and (c) what 
sort of data types are used for DISTINCT aggregates. In my benchmark, I can try 
to construct a DISTINCT aggregate which uses an inordinate amount of memory, 
and we can see how it behaves.


> Add support for GROUP BY ROLLUP
> -------------------------------
>
>                 Key: DERBY-3002
>                 URL: https://issues.apache.org/jira/browse/DERBY-3002
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 10.4.1.3
>            Reporter: Bryan Pendleton
>            Assignee: Bryan Pendleton
>            Priority: Minor
>         Attachments: fixWhiteSpace.diff, IncludesASimpleTest.diff, 
> passesRegressionTests.diff, prototypeChangeNoTests.diff, 
> rewriteGroupByRS.diff, rollupNullability.diff, useLookahead.diff
>
>
> Provide an implementation of the ROLLUP form of multi-dimensional grouping 
> according to the SQL standard.
> See http://wiki.apache.org/db-derby/OLAPRollupLists for some more detailed 
> information about this aspect of the SQL standard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to