Csaba Ringhofer created IMPALA-13052:
----------------------------------------
Summary: Sampling aggregate result sizes are underestimated
Key: IMPALA-13052
URL: https://issues.apache.org/jira/browse/IMPALA-13052
Project: IMPALA
Issue Type: Bug
Reporter: Csaba Ringhofer
Sampling aggregates (sample, appx_median, histogram) return a string that can
be quite large, but the planner assumes it to have a fixed small size.
Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is single row sent by a host)
select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is single row sent by a host)
select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is single row sent by a host)
This may be also relevant for datasketches functions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)