[
https://issues.apache.org/jira/browse/IMPALA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Becker resolved IMPALA-12783.
------------------------------------
Resolution: Fixed
> Nested struct with varlen data crashes
> --------------------------------------
>
> Key: IMPALA-12783
> URL: https://issues.apache.org/jira/browse/IMPALA-12783
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
>
> If a struct ("main") is within an array and contains two child structs ("s1"
> ans "s2") which both contain strings (or other varlen data), it crashes when
> re-materialised (for example in a sort with limit) if codegen is enabled.
> To reproduce:
> In Hive:
> {code:java}
> create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2:
> STRUCT<str2: STRING>>>) stored as parquet;
> insert into nested values (array( named_struct("s1", named_struct("str1", "A
> string that is long"), "s2", named_struct("str2", "Another string that is
> long") )));{code}
> In Impala:
> {code:java}
> select 1, arr from nested order by 1 limit 1;{code}
> This seems to be because in the codegen'd code, when checking if the strings
> ("str1" and "str2" in the example) are NULL, we incorrectly calculate the
> offset of the null indicator byte from the memory adress of their containing
> struct, not from the beginning of the "master tuple", which in this case is
> the item tuple of the array.
> Note that the null indicators of the struct members are at the end of the
> tuple containing the struct (recursively), i.e. the master tuple.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]