James Taylor created PHOENIX-2988:
-------------------------------------
Summary: Replace COUNT(DISTINCT...) with COUNT(...) when possible
Key: PHOENIX-2988
URL: https://issues.apache.org/jira/browse/PHOENIX-2988
Project: Phoenix
Issue Type: Sub-task
Reporter: James Taylor
An optimization that would really benefit the SELECT COUNT(DISTINCT pkCol)
case: if there's only a single COUNT(DISTINCT pkCol) and the GroupBy ends up
being order preserving, you can replace the COUNT(DISTINCT pkCol) with a
COUNT(pkCol) in the select expression nodes. That'll prevent the
DistinctValueWithCountServerAggregator from being used which keeps a Map of all
unique values and instead just keep a single overall count, which is all we
need thanks to your DistinctPrefixFilter.
A few considerations in the implementation:
* Pass through select in the call to groupBy.compile() in QueryCompiler and you
replacement the COUNT(DISTINCT ...) in place.
* The same replacements need to be done for the HAVING clause. We have a
ParseNodeRewriter that'll help with that. You could do that by creating a
derived class, overriding the {{visitLeave(final FunctionParseNode node,
List<ParseNode> nodes)}} method to return a new COUNT parse node with the
{{nodes}} passed in as children if {{node}} equals the DistinctCountParseNode
that you replaced in the select statement.
* The compilation of the HAVING clause should be moved after the call to
groupBy compile in QueryCompiler, like this:
{code}
groupBy = groupBy.compile(context, select, innerPlanTupleProjector);
Expression having = HavingCompiler.compile(context, select, groupBy);
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)