[ 
https://issues.apache.org/jira/browse/IMPALA-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871107#comment-17871107
 ] 

Daniel Becker commented on IMPALA-13272:
----------------------------------------

The bug is present before IMPALA-12159. The repro query didn't fail because the 
query contains var-len types within the array. Therefore, before those were 
allowed, the bug was "accidentally" avoided. If we use a different query 
without var-len types in the array, we can reproduce the bug also before 
IMPALA-12159.

>From Hive:
{code:java}
create table arr_fixed (arr_contains_nested_struct ARRAY<STRUCT<inner_struct1: 
STRUCT<b: BIGINT, l: INT>, inner_struct2: STRUCT<b: BIGINT, l: INT>, small: 
SMALLINT>> ) stored as parquet;{code}
{code:java}
insert into arr_fixed values (
  array(named_struct("inner_struct1", named_struct("b", 10L, "l", 0),           
                                
                   "inner_struct2", named_struct("b", 100L, "l", 2), "small", 
2S), NULL,
      named_struct("inner_struct1", named_struct("b", NULL, "l", 5),            
                             
                   "inner_struct2", named_struct("b", 1000L, "l", 8), "small", 
20S)));{code}
Then from Impala:
{code:java}
select
  row_no
from (
       select
         arr.small,
         row_number() over (
          order by arr.inner_struct1.b) as row_no
       from arr_fixed t, t.arr_contains_nested_struct arr
     ) res;{code}
It produces the same error, so we should say the bug was introduced by 
IMPALA-12019: Support ORDER BY for collections of fixed length types in select 
list.

 

> Analyitic function of collections can lead to crash
> ---------------------------------------------------
>
>                 Key: IMPALA-13272
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13272
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.4.0
>            Reporter: Csaba Ringhofer
>            Assignee: Daniel Becker
>            Priority: Critical
>
> Using Impala's test data the following query leads to DCHECK in debug builds 
> and may cause more subtle issues in RELEASE builds:
> {code}
> select
>   row_no
> from (
>          select
>                arr.small,
>                row_number() over (
>                 order by arr.inner_struct1.str) as row_no
>          from functional_parquet.collection_struct_mix t, 
> t.arr_contains_nested_struct arr
>        ) res
> {code}
> The following DCHECK is hit:
> {code}
> tuple.h:296 Check failed: offset != -1
> {code}
> The problem seems to be with arr.small, which is referenced in the inline 
> view, but not used in the outer query - removing it from the inline view or 
> adding it to the outer select leads to avoiding the bug. The problem seems 
> related to materialization - offset==-1 means that the slot is not 
> materialized, but the Parquet scanner still tries to materialize it.
> It is not clear yet which commit introduced the bug or whether this is a bug 
> in the planner or the backend. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to