[jira] [Commented] (IMPALA-9338) Impala crashing in impala::RowDescriptor::TupleIsNullable(int)

Abhishek Rawat (Jira) Fri, 13 Nov 2020 12:41:46 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231815#comment-17231815
 ]


Abhishek Rawat commented on IMPALA-9338:
----------------------------------------

I am able to repro this using the tpch data set from the minicluster.

Here is the case where we generate a bad plan:
{code:java}
explain select * from tpch.customer c left outer join tpch.lineitem l ON 
c.c_custkey = l.l_orderkey and c.C_CUSTKEY = l.L_ORDERKEY;
Query: explain select * from tpch.customer c left outer join tpch.lineitem l ON 
c.c_custkey = l.l_orderkey and c.C_CUSTKEY = l.L_ORDERKEY
+----------------------------------------------------------------------------+
| Explain String                                                             |
+----------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=37.00MB Threads=6                |
| Per-Host Resource Estimates: Memory=381MB                                  |
|                                                                            |
| PLAN-ROOT SINK                                                             |
| |                                                                          |
| 05:EXCHANGE [UNPARTITIONED]                                                |
| |                                                                          |
| 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                               |
| |  hash predicates: c.C_CUSTKEY = l.L_ORDERKEY, c.C_CUSTKEY = l.L_ORDERKEY |
| |  runtime filters: RF000 <- c.C_CUSTKEY, RF001 <- c.C_CUSTKEY             |
| |  row-size=448B cardinality=150.00K                                       |
| |                                                                          |
| |--04:EXCHANGE [HASH(l.L_ORDERKEY,l.L_ORDERKEY)]                           |
| |  |                                                                       |
| |  00:SCAN HDFS [tpch.customer c]                                          |
| |     HDFS partitions=1/1 files=1 size=23.08MB                             |
| |     row-size=218B cardinality=150.00K                                    |
| |                                                                          |
| 03:EXCHANGE [HASH(c.C_CUSTKEY,c.C_CUSTKEY)]                                |
| |                                                                          |
| 01:SCAN HDFS [tpch.lineitem l]                                             |
|    HDFS partitions=1/1 files=1 size=718.94MB                               |
|    runtime filters: RF000 -> l.L_ORDERKEY, RF001 -> l.L_ORDERKEY           |
|    row-size=231B cardinality=6.00M                                         |
+----------------------------------------------------------------------------+
{code}
Notice that both the EXCHANGE(04) and EXCHANGE(03) above the SCAN nodes is 
referencing incorrect column.

The following query however generates the proper plan.
{code:java}
explain select * from tpch.customer c left outer join tpch.lineitem l ON 
c.c_custkey = l.l_orderkey and c.c_custkey = l.l_orderkey;
Query: explain select * from tpch.customer c left outer join tpch.lineitem l ON 
c.c_custkey = l.l_orderkey and c.c_custkey = l.l_orderkey
+----------------------------------------------------------------------------+
| Explain String                                                             |
+----------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=37.00MB Threads=6                |
| Per-Host Resource Estimates: Memory=381MB                                  |
|                                                                            |
| PLAN-ROOT SINK                                                             |
| |                                                                          |
| 05:EXCHANGE [UNPARTITIONED]                                                |
| |                                                                          |
| 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                               |
| |  hash predicates: l.l_orderkey = c.c_custkey, l.l_orderkey = c.c_custkey |
| |  runtime filters: RF000 <- c.c_custkey, RF001 <- c.c_custkey             |
| |  row-size=448B cardinality=150.00K                                       |
| |                                                                          |
| |--04:EXCHANGE [HASH(c.c_custkey,c.c_custkey)]                             |
| |  |                                                                       |
| |  00:SCAN HDFS [tpch.customer c]                                          |
| |     HDFS partitions=1/1 files=1 size=23.08MB                             |
| |     row-size=218B cardinality=150.00K                                    |
| |                                                                          |
| 03:EXCHANGE [HASH(l.l_orderkey,l.l_orderkey)]                              |
| |                                                                          |
| 01:SCAN HDFS [tpch.lineitem l]                                             |
|    HDFS partitions=1/1 files=1 size=718.94MB                               |
|    runtime filters: RF000 -> l.l_orderkey, RF001 -> l.l_orderkey           |
|    row-size=231B cardinality=6.00M                                         |
+----------------------------------------------------------------------------+
{code}
The root issue seems to be that when we have duplicate predicates and if the 
predicates have different case, the code seems to treat them as different 
predicates. So for instance in the first query above `c.c_custkey = 
l.l_orderkey` and `c.C_CUSTKEY = l.L_ORDERKEY` are treated as different 
predicates and when join is flipped the predicates gets flipped two times and 
we end up with referencing the incorrect side of the predicate for each leg of 
the HSJOIN.

 

If however the duplicate predicates have the same case, we treat them as the 
same and end up with proper plan.

 

> Impala crashing in impala::RowDescriptor::TupleIsNullable(int)
> --------------------------------------------------------------
>
>                 Key: IMPALA-9338
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9338
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Abhishek Rawat
>            Assignee: Yida Wu
>            Priority: Blocker
>              Labels: crash
>
> Repro:
> {code:java}
> create database default;
> CREATE EXTERNAL TABLE default.dimension ( ssn_id INT, act_num CHAR(1), eff_dt 
> CHAR(10), seq_num SMALLINT, entry_dt CHAR(10), map ARRAY<INT>, src CHAR(10), 
> msg CHAR(1), msg_num CHAR(3), remarks CHAR(3), description CHAR(26), 
> default_load_ts CHAR(26), map_cd VARCHAR(50) ) PARTITIONED BY ( year INT, 
> ssn_hash INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' WITH 
> SERDEPROPERTIES ('colelction.delim'=',', 'field.delim'='\u001C', 
> 'serialization.format'='\u001C') STORED AS PARQUET --LOCATION 
> 'hdfs://prdnameservice/user/hive/warehouse/default.db/dimension' 
> TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true', 
> 'impala.lastComputeStatsTime'='1579246708', 'last_modified_by'='a00811p', 
> 'last_modified_time'='1489791214', 'numRows'='7357715311', 
> 'totalSize'='235136295799');
> CREATE EXTERNAL TABLE default.fact ( ssn_id_n INT, bor_act_sfx CHAR(1), 
> start_dt CHAR(10), seq_num SMALLINT, msg_n CHAR(8), end_dt CHAR(10), reviews 
> CHAR(50), description CHAR(50), detail CHAR(50), default_load_ts CHAR(26) ) 
> PARTITIONED BY ( year INT, ssn_hash INT ) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\u0016' WITH SERDEPROPERTIES ('field.delim'='\u0016', 
> 'serialization.format'='\u0016') STORED AS PARQUET --LOCATION 
> 'hdfs://prdnameservice/user/hive/warehouse/default.db/fact' TBLPROPERTIES 
> ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true', 
> 'impala.lastComputeStatsTime'='1579242111', 'last_modified_by'='e32940', 
> 'last_modified_time'='1484186332', 'numRows'='5142832439', 
> 'totalSize'='105397898347'); 
> use default;
> select ssn_id_n, bor_act_sfx, amap.item, start_dt, reviews, concat(msg, 
> msg_num) corr_code from dimension, dimension.map amap LEFT JOIN fact ON 
> dimension.ssn_id = fact.ssn_id_n AND dimension.act_num = fact.bor_act_sfx AND 
> dimension.eff_dt = fact.start_dt and dimension.year = fact.year --and 
> dimension.month(cast(eff_dt as timestamp)) = fact.month(cast(start_dt as 
> timestamp)) AND dimension.YEAR = fact.YEAR AND fact.year in (2018,2019) where 
> dimension.msg like '%B295%' AND dimension.year in (2018,2019);{code}
> Stack Trace:
> {code:java}
> #0 0x0000000000f8b1b9 in impala::RowDescriptor::TupleIsNullable(int) const () 
> #1 0x000000000130911f in impala::SlotRef::Init(impala::RowDescriptor const&, 
> impala::RuntimeState*) () 
> #2 0x000000000130748e in impala::ScalarExpr::Create(impala::TExpr const&, 
> impala::RowDescriptor const&, impala::RuntimeState*, impala::ObjectPool*, 
> impala::ScalarExpr**) () 
> #3 0x00000000013075e5 in 
> impala::ScalarExpr::Create(std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> > const&, impala::RowDescriptor const&, 
> impala::RuntimeState*, impala::ObjectPool*, std::vector<impala::ScalarExpr*, 
> std::allocator<impala::ScalarExpr*> >*) () 
> #4 0x000000000130769f in 
> impala::ScalarExpr::Create(std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> > const&, impala::RowDescriptor const&, 
> impala::RuntimeState*, std::vector<impala::ScalarExpr*, 
> std::allocator<impala::ScalarExpr*> >*) () 
> #5 0x000000000149c1aa in 
> impala::KrpcDataStreamSender::Init(std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> > const&, impala::TDataSink const&, 
> impala::RuntimeState*) () 
> #6 0x0000000001208ad3 in impala::DataSink::Create(impala::TPlanFragmentCtx 
> const&, impala::TPlanFragmentInstanceCtx const&, impala::RowDescriptor 
> const*, impala::RuntimeState*, impala::DataSink**) () 
> #7 0x0000000000fac9a4 in impala::FragmentInstanceState::Prepare() () 
> #8 0x0000000000fad3dd in impala::FragmentInstanceState::Exec() () 
> #9 0x0000000000f98e77 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) () 
> #10 0x00000000011a1490 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) () 
> #11 0x00000000011a203a in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (std::string const&, std::string const&, boost::function<void ()>, 
> impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>), boost::_bi::list5<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() () 
> #12 0x00000000017909ca in thread_proxy () #13 0x00007f8832fa6aa1 in 
> __pthread_initialize_minimal_internal () from /lib64/libpthread.so.0 #14 
> 0x0000000000000000 in ?? ()
> {code}
>  
> The crash only happens when ROJ plan is selected. If, LOJ plan is selected 
> the query runs successfully.
> Initial investigation indicates that the Scalar expression being contructed 
> in the above stack trace is referencing an invalid tupleId in the row 
> descriptor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9338) Impala crashing in impala::RowDescriptor::TupleIsNullable(int)

Reply via email to