[jira] [Updated] (PHOENIX-2988) Replace COUNT(DISTINCT...) with COUNT(...) when possible

James Taylor (JIRA) Sun, 12 Jun 2016 00:13:27 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Taylor updated PHOENIX-2988:
----------------------------------
    Description: 
An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) 
case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up 
being order preserving, you can replace the COUNT(DISTINCT pkCol) with a 
COUNT(pkCol) in the SELECT, HAVING, and ORDER BY clauses. That'll prevent the 
DistinctValueWithCountServerAggregator from being used which keeps a Map of all 
unique values and instead just keep a single overall count, which is all we 
need thanks to your DistinctPrefixFilter.

A few considerations in the implementation:
* Pass through select in the call to groupBy.compile() in QueryCompiler and 
change the return type to return a new select (as the SELECT, HAVING, and ORDER 
BY may have been rewritten). Probably easiest if the GroupBy object is just 
mutated in place.
* Within the groupBy.compile() call, use a visitor on the SELECT, HAVING and 
ORDER BY clauses to do the rewriting. You can do that by deriving a class from 
ParseNodeRewriter, overriding the {{visitLeave(final FunctionParseNode node, 
List<ParseNode> nodes)}} method to return a new COUNT parse node with the 
{{nodes}} passed in as children if {{node}} equals the DistinctCountParseNode 
that you replaced in the select statement.
* The compilation of the HAVING clause should be moved after the call to 
groupBy compile in QueryCompiler, like this since it may have been rewritten in 
the groupBy.compile call:
{code}
        select = groupBy.compile(context, select, innerPlanTupleProjector);
        Expression having = HavingCompiler.compile(context, select, groupBy);
{code}


  was:
An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) 
case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up 
being order preserving, you can replace the COUNT(DISTINCT pkCol) with a 
COUNT(pkCol) in the select expression nodes. That'll prevent the 
DistinctValueWithCountServerAggregator from being used which keeps a Map of all 
unique values and instead just keep a single overall count, which is all we 
need thanks to your DistinctPrefixFilter.

A few considerations in the implementation:
* Pass through select in the call to groupBy.compile() in QueryCompiler and you 
replacement the COUNT(DISTINCT ...) in place.
* The same replacements need to be done for the HAVING clause. We have a 
ParseNodeRewriter that'll help with that. You could do that by creating a 
derived class, overriding the {{visitLeave(final FunctionParseNode node, 
List<ParseNode> nodes)}} method to return a new COUNT parse node with the 
{{nodes}} passed in as children if {{node}} equals the DistinctCountParseNode 
that you replaced in the select statement.
* The compilation of the HAVING clause should be moved after the call to 
groupBy compile in QueryCompiler, like this:
{code}
        groupBy = groupBy.compile(context, select, innerPlanTupleProjector);
        Expression having = HavingCompiler.compile(context, select, groupBy);
{code}


> Replace COUNT(DISTINCT...) with COUNT(...) when possible
> --------------------------------------------------------
>
>                 Key: PHOENIX-2988
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2988
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>             Fix For: 4.8.0
>
>
> An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol) 
> case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up 
> being order preserving, you can replace the COUNT(DISTINCT pkCol) with a 
> COUNT(pkCol) in the SELECT, HAVING, and ORDER BY clauses. That'll prevent the 
> DistinctValueWithCountServerAggregator from being used which keeps a Map of 
> all unique values and instead just keep a single overall count, which is all 
> we need thanks to your DistinctPrefixFilter.
> A few considerations in the implementation:
> * Pass through select in the call to groupBy.compile() in QueryCompiler and 
> change the return type to return a new select (as the SELECT, HAVING, and 
> ORDER BY may have been rewritten). Probably easiest if the GroupBy object is 
> just mutated in place.
> * Within the groupBy.compile() call, use a visitor on the SELECT, HAVING and 
> ORDER BY clauses to do the rewriting. You can do that by deriving a class 
> from ParseNodeRewriter, overriding the {{visitLeave(final FunctionParseNode 
> node, List<ParseNode> nodes)}} method to return a new COUNT parse node with 
> the {{nodes}} passed in as children if {{node}} equals the 
> DistinctCountParseNode that you replaced in the select statement.
> * The compilation of the HAVING clause should be moved after the call to 
> groupBy compile in QueryCompiler, like this since it may have been rewritten 
> in the groupBy.compile call:
> {code}
>         select = groupBy.compile(context, select, innerPlanTupleProjector);
>         Expression having = HavingCompiler.compile(context, select, groupBy);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-2988) Replace COUNT(DISTINCT...) with COUNT(...) when possible

Reply via email to