[ https://issues.apache.org/jira/browse/IMPALA-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers updated IMPALA-7944: -------------------------------- Description: The {{count(*)}} function has an NDV of 1: the function always returns a single value. This is important because it tells us that the query: {code:sql} SELECT COUNT(*) FROM foo {code} Returns just one row. All good. In the analyzer, we set a value of NDV=1 via an incorrect process: by labeling {{count(*)}} as constant: * For historical reasons, NDV calculations occur before a node is analyzed. * We use the default NDV calc: if the node is constant, set NDV = 1, else compute it. * Since the function node for {{count(*)}} is not analyzed, we determine constant-ness from an inspection. * All checks for non-constantness fail, leaving the final check: a function is constant if either a) it has no arguments, or b) all its arguments are constant. * Since {{count(*)}} has no expression arguments, and is not marked as non-deterministic, we infer it must be costant. * Therefore, it's NDV is set to 1. This, of course, highly unstable for multiple reasons: * NDV calculations are done before the node is analyzed. This means, NDV calculations for a {{SlotRef}} would fail because the ref has not yet been resolved to a column. (The {{SlotRef}} has special code to work around this fact.) * The "treat zero-argument functions as constants and so use NDV=1" rule works for {{count(*)}}, but not for {{count(c)}}, nor or {{sum(c)}}, both of which should have NDV=1. * {{count(*)}} is not really a constant; its NDV=1 setting should not really on (benignly) assuming it is. * The NDV check const-ness is temporary; once the node is analyzed, it is correctly marked as non-const. So, the calcs rely on one path saying the the function is const, another path saying it is not const. This should be cleaned up to provide a more reliable, understandable way of achieving the goal of NDV=1. As it turns out, this seemed to have been a known issue in the code: {code:java} // TODO: we can't correctly determine const-ness before analyzing 'fn_'. We should // rework logic so that we do not call this function on unanalyzed exprs. // Aggregate functions are never constant. {code} was: The {{count\(*)}} function has an NDV of 1: the function always returns a single value. This is important because it tells us that the query: {code:sql} SELECT COUNT(*) FROM foo {code} Returns just one row. All good. In the analyzer, we set a value of NDV=1 via an incorrect process: by labeling {{count\(*)}} as constant: * For historical reasons, NDV calculations occur before a node is analyzed. * We use the default NDV calc: if the node is constant, set NDV = 1, else compute it. * Since the function node for {{count\(*)}} is not analyzed, we determine constant-ness from an inspection. * All checks for non-constantness fail, leaving the final check: a function is constant if either a) it has no arguments, or b) all its arguments are constant. * Since {{count\(*)}} has no expression arguments, and is not marked as non-deterministic, we infer it must be costant. * Therefore, it's NDV is set to 1. This, of course, highly unstable for multiple reasons: * NDV calculations are done before the node is analyzed. This means, NDV calculations for a {{SlotRef}} would fail because the ref has not yet been resolved to a column. (The {{SlotRef}} has special code to work around this fact.) * The "treat zero-argument functions as constants and so use NDV=1" rule works for {{count\(*)}}, but not for {{count(c)}}, nor or {{sum(c)}}, both of which should have NDV=1. * {{count\(*)}} is not really a constant; its NDV=1 setting should not really on (benignly) assuming it is. * The NDV check const-ness is temporary; once the node is analyzed, it is correctly marked as non-const. So, the calcs rely on one path saying the the function is const, another path saying it is not const. This should be cleaned up to provide a more reliable, understandable way of achieving the goal of NDV=1. > count(*) correctly has NDV=1 via being labeled as constant > ---------------------------------------------------------- > > Key: IMPALA-7944 > URL: https://issues.apache.org/jira/browse/IMPALA-7944 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > The {{count(*)}} function has an NDV of 1: the function always returns a > single value. This is important because it tells us that the query: > {code:sql} > SELECT COUNT(*) FROM foo > {code} > Returns just one row. All good. > In the analyzer, we set a value of NDV=1 via an incorrect process: by > labeling {{count(*)}} as constant: > * For historical reasons, NDV calculations occur before a node is analyzed. > * We use the default NDV calc: if the node is constant, set NDV = 1, else > compute it. > * Since the function node for {{count(*)}} is not analyzed, we determine > constant-ness from an inspection. > * All checks for non-constantness fail, leaving the final check: a function > is constant if either a) it has no arguments, or b) all its arguments are > constant. > * Since {{count(*)}} has no expression arguments, and is not marked as > non-deterministic, we infer it must be costant. > * Therefore, it's NDV is set to 1. > This, of course, highly unstable for multiple reasons: > * NDV calculations are done before the node is analyzed. This means, NDV > calculations for a {{SlotRef}} would fail because the ref has not yet been > resolved to a column. (The {{SlotRef}} has special code to work around this > fact.) > * The "treat zero-argument functions as constants and so use NDV=1" rule > works for {{count(*)}}, but not for {{count(c)}}, nor or {{sum(c)}}, both of > which should have NDV=1. > * {{count(*)}} is not really a constant; its NDV=1 setting should not really > on (benignly) assuming it is. > * The NDV check const-ness is temporary; once the node is analyzed, it is > correctly marked as non-const. So, the calcs rely on one path saying the the > function is const, another path saying it is not const. > This should be cleaned up to provide a more reliable, understandable way of > achieving the goal of NDV=1. > As it turns out, this seemed to have been a known issue in the code: > {code:java} > // TODO: we can't correctly determine const-ness before analyzing 'fn_'. > We should > // rework logic so that we do not call this function on unanalyzed exprs. > > // Aggregate functions are never constant. > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org