Paul Rogers created IMPALA-7944:
-----------------------------------
Summary: count(*) correctly has NDV=1 via being labeled as constant
Key: IMPALA-7944
URL: https://issues.apache.org/jira/browse/IMPALA-7944
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 3.0
Reporter: Paul Rogers
Assignee: Paul Rogers
The {{count\(*)}} function has an NDV of 1: the function always returns a
single value. This is important because it tells us that the query:
{code:sql}
SELECT COUNT(*) FROM foo
{code}
Returns just one row. All good.
In the analyzer, we set a value of NDV=1 via an incorrect process: by labeling
{{count\(*)}} as constant:
* For historical reasons, NDV calculations occur before a node is analyzed.
* We use the default NDV calc: if the node is constant, set NDV = 1, else
compute it.
* Since the function node for {{count\(*)}} is not analyzed, we determine
constant-ness from an inspection.
* All checks for non-constantness fail, leaving the final check: a function is
constant if either a) it has no arguments, or b) all its arguments are constant.
* Since {{count\(*)}} has no expression arguments, and is not marked as
non-deterministic, we infer it must be costant.
* Therefore, it's NDV is set to 1.
This, of course, highly unstable for multiple reasons:
* NDV calculations are done before the node is analyzed. This means, NDV
calculations for a {{SlotRef}} would fail because the ref has not yet been
resolved to a column. (The {{SlotRef}} has special code to work around this
fact.)
* The "treat zero-argument functions as constants and so use NDV=1" rule works
for {{count\(*)}}, but not for {{count(c)}}, nor or {{sum(c)}}, both of which
should have NDV=1.
* {{count\(*)}} is not really a constant; its NDV=1 setting should not really
on (benignly) assuming it is.
* The NDV check const-ness is temporary; once the node is analyzed, it is
correctly marked as non-const. So, the calcs rely on one path saying the the
function is const, another path saying it is not const.
This should be cleaned up to provide a more reliable, understandable way of
achieving the goal of NDV=1.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)