[
https://issues.apache.org/jira/browse/IMPALA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Becker resolved IMPALA-11067.
------------------------------------
Resolution: Fixed
> Unify struct subexpressions in rows
> -----------------------------------
>
> Key: IMPALA-11067
> URL: https://issues.apache.org/jira/browse/IMPALA-11067
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
> Labels: complextype, nested_types
>
> If a column is given multiple times in the select list, it is not duplicated
> under the hood in the row because we recognise that multiple columns in the
> result reference the same actual column, therefore the row size does not
> increase:
>
> {code:java}
> explain select id, outer_struct from
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct from
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 |
> | Per-Host Resource Estimates: Memory=20MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> | HDFS partitions=1/1 files=1 size=1.18KB |
> | row-size=64B cardinality=5 |
> +---------------------------------------------------------------+
> {code}
> With the id column duplicated:
>
> {code:java}
> explain select id, id, outer_struct from
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, id, outer_struct from
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.07MB Threads=2 |
> | Per-Host Resource Estimates: Memory=20MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> | HDFS partitions=1/1 files=1 size=1.18KB |
> | row-size=64B cardinality=5 |
> +---------------------------------------------------------------+
> {code}
> However, if we query a struct and a subfield of the same struct, we do not
> reuse the existing slot in the row but duplicate the subexpression,
> increasing the row size:
>
> {code:java}
> explain select id, outer_struct, outer_struct.inner_struct2 from
> functional_orc_def.complextypes_nested_structs;
> Query: explain select id, outer_struct, outer_struct.inner_struct2 from
> functional_orc_def.complextypes_nested_structs
> +---------------------------------------------------------------+
> | Explain String |
> +---------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=4.09MB Threads=2 |
> | Per-Host Resource Estimates: Memory=20MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 00:SCAN HDFS [functional_orc_def.complextypes_nested_structs] |
> | HDFS partitions=1/1 files=1 size=1.18KB |
> | row-size=80B cardinality=5 |
> +---------------------------------------------------------------+
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)