Daniel Becker created IMPALA-12783:
--------------------------------------
Summary: Nested struct with varlen data crashes
Key: IMPALA-12783
URL: https://issues.apache.org/jira/browse/IMPALA-12783
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Daniel Becker
Assignee: Daniel Becker
If a struct ("main") is within an array and contains two child structs ("s1"
ans "s2") which both contain strings (or other varlen data), it crashes when
re-materialised (for example in a sort with limit) if codegen is enabled.
To reproduce:
In Hive:
{code:java}
create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2:
STRUCT<str2: STRING>>>) stored as parquet;
insert into nested values (array( named_struct("s1", named_struct("str1", "A
string that is long"), "s2", named_struct("str2", "Another string that is
long") )));{code}
In Impala:
{code:java}
select 1, arr from nested order by 1 limit 1;{code}
This seems to be because in the codegen'd code, when checking if the strings
("str1" and "str2" in the example) are NULL, we incorrectly calculate the
offset of the null indicator byte from the memory adress of their containing
struct, not from the beginning of the "master tuple", which in this case is the
item tuple of the array.
Note that the null indicators of the struct members are at the end of the tuple
containing the struct (recursively), i.e. the master tuple.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]