[ 
https://issues.apache.org/jira/browse/HIVE-29084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007342#comment-18007342
 ] 

Konstantin Bereznyakov commented on HIVE-29084:
-----------------------------------------------

uzing Tez, EXPLAN with CBO ON/OFF:


{code:java}
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 4.2.0-SNAPSHOT)
Driver: Hive JDBC (version 4.2.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.2.0-SNAPSHOT by Apache Hive
0: jdbc:hive2://localhost:10000> set hive.execution.engine;
+----------------------------+
|            set             |
+----------------------------+
| hive.execution.engine=tez  |
+----------------------------+
1 row selected (0.055 seconds)
0: jdbc:hive2://localhost:10000> SET hive.cbo.enable;
+-----------------------+
|          set          |
+-----------------------+
| hive.cbo.enable=true  |
+-----------------------+
1 row selected (0.009 seconds)
0: jdbc:hive2://localhost:10000> EXPLAIN SELECT first_val, second_val
. . . . . . . . . . . . . . . .> FROM (SELECT array('a', 'b') as val_array) 
inline_data
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv1 AS 
first_val
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv2 AS 
second_val
. . . . . . . . . . . . . . . .> WHERE first_val != second_val;
INFO  : Compiling 
command(queryId=root_20250715215842_dfc474bd-e678-4c33-817d-5f795e61c64b): 
EXPLAIN SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=root_20250715215842_dfc474bd-e678-4c33-817d-5f795e61c64b); Time 
taken: 0.195 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=root_20250715215842_dfc474bd-e678-4c33-817d-5f795e61c64b): 
EXPLAIN SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=root_20250715215842_dfc474bd-e678-4c33-817d-5f795e61c64b); Time 
taken: 0.042 seconds
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Select Operator [SEL_14]                       |
|       Output:["_col0","_col1"]                     |
|       Lateral View Join Operator [LVJ_12]          |
|         Output:["_col1","_col2"]                   |
|         Select Operator [SEL_9]                    |
|           Output:["_col1"]                         |
|           Lateral View Forward [LVF_8]             |
|             Lateral View Join Operator [LVJ_6]     |
|               Output:["_col0","_col1"]             |
|               Select Operator [SEL_3]              |
|                 Output:["_col0"]                   |
|                 Lateral View Forward [LVF_2]       |
|                   Select Operator [SEL_1]          |
|                     Output:["_col0"]               |
|                     TableScan [TS_0]               |
|     Select Operator [SEL_14]                       |
|       Output:["_col0","_col1"]                     |
|       Lateral View Join Operator [LVJ_12]          |
|         Output:["_col1","_col2"]                   |
|         UDTF Operator [UDTF_11]                    |
|           function name:explode                    |
|           Select Operator [SEL_10]                 |
|             Output:["_col0"]                       |
|              Please refer to the previous Lateral View Forward [LVF_8] |
|     Select Operator [SEL_14]                       |
|       Output:["_col0","_col1"]                     |
|       Lateral View Join Operator [LVJ_12]          |
|         Output:["_col1","_col2"]                   |
|         Select Operator [SEL_9]                    |
|           Output:["_col1"]                         |
|           Lateral View Forward [LVF_8]             |
|             Lateral View Join Operator [LVJ_6]     |
|               Output:["_col0","_col1"]             |
|               UDTF Operator [UDTF_5]               |
|                 function name:explode              |
|                 Select Operator [SEL_4]            |
|                   Output:["_col0"]                 |
|                    Please refer to the previous Lateral View Forward [LVF_2] |
|     Select Operator [SEL_14]                       |
|       Output:["_col0","_col1"]                     |
|       Lateral View Join Operator [LVJ_12]          |
|         Output:["_col1","_col2"]                   |
|         UDTF Operator [UDTF_11]                    |
|           function name:explode                    |
|           Select Operator [SEL_10]                 |
|             Output:["_col0"]                       |
|              Please refer to the previous Lateral View Forward [LVF_8] |
|                                                    |
+----------------------------------------------------+
53 rows selected (0.259 seconds)
0: jdbc:hive2://localhost:10000>   SET hive.cbo.enable=false;
No rows affected (0.013 seconds)
0: jdbc:hive2://localhost:10000>   SET hive.cbo.enable;
+------------------------+
|          set           |
+------------------------+
| hive.cbo.enable=false  |
+------------------------+
1 row selected (0.009 seconds)
0: jdbc:hive2://localhost:10000> EXPLAIN SELECT first_val, second_val
. . . . . . . . . . . . . . . .> FROM (SELECT array('a', 'b') as val_array) 
inline_data
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv1 AS 
first_val
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv2 AS 
second_val
. . . . . . . . . . . . . . . .> WHERE first_val != second_val;
INFO  : Compiling 
command(queryId=root_20250715215924_64920040-2823-4c01-b3ea-f5099d4edb82): 
EXPLAIN SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=root_20250715215924_64920040-2823-4c01-b3ea-f5099d4edb82); Time 
taken: 0.109 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=root_20250715215924_64920040-2823-4c01-b3ea-f5099d4edb82): 
EXPLAIN SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=root_20250715215924_64920040-2823-4c01-b3ea-f5099d4edb82); Time 
taken: 0.036 seconds
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Select Operator [SEL_13]                       |
|       Output:["_col0","_col1"]                     |
|       Filter Operator [FIL_15]                     |
|         predicate:(_col1 <> _col2)                 |
|         Lateral View Join Operator [LVJ_11]        |
|           Output:["_col1","_col2"]                 |
|           Select Operator [SEL_8]                  |
|             Output:["_col1"]                       |
|             Lateral View Forward [LVF_7]           |
|               Lateral View Join Operator [LVJ_6]   |
|                 Output:["_col0","_col1"]           |
|                 Select Operator [SEL_3]            |
|                   Output:["_col0"]                 |
|                   Lateral View Forward [LVF_2]     |
|                     Select Operator [SEL_1]        |
|                       Output:["_col0"]             |
|                       TableScan [TS_0]             |
|     Select Operator [SEL_13]                       |
|       Output:["_col0","_col1"]                     |
|       Filter Operator [FIL_15]                     |
|         predicate:(_col1 <> _col2)                 |
|         Lateral View Join Operator [LVJ_11]        |
|           Output:["_col1","_col2"]                 |
|           UDTF Operator [UDTF_10]                  |
|             function name:explode                  |
|             Select Operator [SEL_9]                |
|               Output:["_col0"]                     |
|                Please refer to the previous Lateral View Forward [LVF_7] |
|     Select Operator [SEL_13]                       |
|       Output:["_col0","_col1"]                     |
|       Filter Operator [FIL_15]                     |
|         predicate:(_col1 <> _col2)                 |
|         Lateral View Join Operator [LVJ_11]        |
|           Output:["_col1","_col2"]                 |
|           Select Operator [SEL_8]                  |
|             Output:["_col1"]                       |
|             Lateral View Forward [LVF_7]           |
|               Lateral View Join Operator [LVJ_6]   |
|                 Output:["_col0","_col1"]           |
|                 UDTF Operator [UDTF_5]             |
|                   function name:explode            |
|                   Select Operator [SEL_4]          |
|                     Output:["_col0"]               |
|                      Please refer to the previous Lateral View Forward 
[LVF_2] |
|     Select Operator [SEL_13]                       |
|       Output:["_col0","_col1"]                     |
|       Filter Operator [FIL_15]                     |
|         predicate:(_col1 <> _col2)                 |
|         Lateral View Join Operator [LVJ_11]        |
|           Output:["_col1","_col2"]                 |
|           UDTF Operator [UDTF_10]                  |
|             function name:explode                  |
|             Select Operator [SEL_9]                |
|               Output:["_col0"]                     |
|                Please refer to the previous Lateral View Forward [LVF_7] |
|                                                    |
+----------------------------------------------------+
59 rows selected (0.159 seconds)
{code}

> CBO returns incorrect results when WHERE clause filters multiple LATERAL VIEW 
> explode cross-products
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29084
>                 URL: https://issues.apache.org/jira/browse/HIVE-29084
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.1.0, 4.0.1
>         Environment:   - Affected Versions: Hive 4.0.1, 4.2.0-SNAPSHOT 
> (likely all CBO+lateral view versions)
>   - Execution Engine: MapReduce (local mode)
>   - Metastore: Derby embedded
>            Reporter: Konstantin Bereznyakov
>            Assignee: Konstantin Bereznyakov
>            Priority: Major
>         Attachments: hive_cbo_lateral_view_where_bug.log, 
> hive_cbo_lateral_view_where_bug.sql
>
>
> When CBO is enabled, queries with multiple LATERAL VIEW explode operations 
> and WHERE clause filtering return incorrect results. The WHERE clause 
> selectivity is not properly applied during optimization, causing the query to 
> return rows that should be filtered out.
> I can reliably reproduce the bug with 2+ occurrences of "LATERAL VIEW 
> explode()" *and* a WHERE clause; it does not seem to happen with only one 
> "LATERAL VIEW explode()" or without a WHERE clause.
> Originally reproduced with published 4.0.1 container, then verified with 
> locally built current master branch.
> I was *not* able to reproduce it with unit tests.
> {{*Minimal Reproduction Case:*}}
> {{  }}
> {{  DROP TABLE IF EXISTS cbo_lateral_view_where_bug;}}
> {{  CREATE TABLE cbo_lateral_view_where_bug (values array<string>);}}
> {{  INSERT INTO cbo_lateral_view_where_bug SELECT array('a', 'b');}}
> {{  ANALYZE TABLE cbo_lateral_view_where_bug COMPUTE STATISTICS;}}{{  **  }}
> {{  *-- Correct result (CBO OFF)*}}
> {{  SET hive.cbo.enable=false;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{  WHERE first_val != second_val;}}{{  *-- Incorrect result (CBO ON)*}}
> {{  SET hive.cbo.enable=true;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{  WHERE first_val != second_val;}}{{  **  }}
> {{  *–- Expected vs Actual Results:*}}
> {{  - CBO OFF (correct): 2 rows: ('a','b') and ('b','a')}}
> {{  - CBO ON (incorrect): 4 rows: ('a','b'), ('b','a'), ('a','a'), ('b','b')}}
>   *Root Cause:*
>   CBO fails to properly estimate the selectivity of WHERE clauses that filter 
> cross-products created by multiple lateral views on the same array. The 
> optimizer appears to ignore or miscalculate the filtering effect, allowing 
> rows that should be excluded.
>   *Impact:*
>   - Data Correctness: Queries return wrong results, not just performance 
> issues
>   - Scope: Requires 2+ LATERAL VIEW explode operations with WHERE filtering
>   - Severity: Silent data corruption - users may not notice incorrect results
>   {*}Workaround{*}:
> Set {*}hive.cbo.enable{*}=false for queries with multiple lateral view 
> cross-products and WHERE clause filtering.
>  
> Attachments: 
> - [^hive_cbo_lateral_view_where_bug.sql] - SQL to reproduce
> - Hive log: [^hive_cbo_lateral_view_where_bug.log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to