[jira] [Created] (IMPALA-13052) Sampling aggregate result sizes are underestimated

Csaba Ringhofer (Jira) Thu, 02 May 2024 02:27:54 -0700

Csaba Ringhofer created IMPALA-13052:
----------------------------------------


             Summary: Sampling aggregate result sizes are underestimated
                 Key: IMPALA-13052
                 URL: https://issues.apache.org/jira/browse/IMPALA-13052
             Project: IMPALA
          Issue Type: Bug
            Reporter: Csaba Ringhofer


Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13052) Sampling aggregate result sizes are underestimated

Reply via email to