[
https://issues.apache.org/jira/browse/IMPALA-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871107#comment-17871107
]
Daniel Becker commented on IMPALA-13272:
----------------------------------------
The bug is present before IMPALA-12159. The repro query didn't fail because the
query contains var-len types within the array. Therefore, before those were
allowed, the bug was "accidentally" avoided. If we use a different query
without var-len types in the array, we can reproduce the bug also before
IMPALA-12159.
>From Hive:
{code:java}
create table arr_fixed (arr_contains_nested_struct ARRAY<STRUCT<inner_struct1:
STRUCT<b: BIGINT, l: INT>, inner_struct2: STRUCT<b: BIGINT, l: INT>, small:
SMALLINT>> ) stored as parquet;{code}
{code:java}
insert into arr_fixed values (
array(named_struct("inner_struct1", named_struct("b", 10L, "l", 0),
"inner_struct2", named_struct("b", 100L, "l", 2), "small",
2S), NULL,
named_struct("inner_struct1", named_struct("b", NULL, "l", 5),
"inner_struct2", named_struct("b", 1000L, "l", 8), "small",
20S)));{code}
Then from Impala:
{code:java}
select
row_no
from (
select
arr.small,
row_number() over (
order by arr.inner_struct1.b) as row_no
from arr_fixed t, t.arr_contains_nested_struct arr
) res;{code}
It produces the same error, so we should say the bug was introduced by
IMPALA-12019: Support ORDER BY for collections of fixed length types in select
list.
> Analyitic function of collections can lead to crash
> ---------------------------------------------------
>
> Key: IMPALA-13272
> URL: https://issues.apache.org/jira/browse/IMPALA-13272
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 4.4.0
> Reporter: Csaba Ringhofer
> Assignee: Daniel Becker
> Priority: Critical
>
> Using Impala's test data the following query leads to DCHECK in debug builds
> and may cause more subtle issues in RELEASE builds:
> {code}
> select
> row_no
> from (
> select
> arr.small,
> row_number() over (
> order by arr.inner_struct1.str) as row_no
> from functional_parquet.collection_struct_mix t,
> t.arr_contains_nested_struct arr
> ) res
> {code}
> The following DCHECK is hit:
> {code}
> tuple.h:296 Check failed: offset != -1
> {code}
> The problem seems to be with arr.small, which is referenced in the inline
> view, but not used in the outer query - removing it from the inline view or
> adding it to the outer select leads to avoiding the bug. The problem seems
> related to materialization - offset==-1 means that the slot is not
> materialized, but the Parquet scanner still tries to materialize it.
> It is not clear yet which commit introduced the bug or whether this is a bug
> in the planner or the backend.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]