[ 
https://issues.apache.org/jira/browse/HIVE-29084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007343#comment-18007343
 ] 

Konstantin Bereznyakov commented on HIVE-29084:
-----------------------------------------------

uzing Tez, EXPLAN EXTENDED with CBO ON/OFF:
{code}
SET hive.cbo.enable=true;
No rows affected (0.007 seconds)
0: jdbc:hive2://localhost:10000>   SET hive.cbo.enable;
+-----------------------+
|          set          |
+-----------------------+
| hive.cbo.enable=true  |
+-----------------------+
1 row selected (0.01 seconds)
0: jdbc:hive2://localhost:10000> EXPLAIN EXTENDED SELECT first_val, second_val
. . . . . . . . . . . . . . . .> FROM (SELECT array('a', 'b') as val_array) 
inline_data
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv1 AS 
first_val
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv2 AS 
second_val
. . . . . . . . . . . . . . . .> WHERE first_val != second_val;
INFO  : Compiling 
command(queryId=root_20250715220200_00940b48-adba-4296-ae50-75b5e92a0c7b): 
EXPLAIN EXTENDED SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=root_20250715220200_00940b48-adba-4296-ae50-75b5e92a0c7b); Time 
taken: 0.216 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=root_20250715220200_00940b48-adba-4296-ae50-75b5e92a0c7b): 
EXPLAIN EXTENDED SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=root_20250715220200_00940b48-adba-4296-ae50-75b5e92a0c7b); Time 
taken: 0.031 seconds
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage                          |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         TableScan                                  |
|           alias: _dummy_table                      |
|           Row Limit Per Split: 1                   |
|           GatherStats: false                       |
|           Select Operator                          |
|             expressions: array('a','b') (type: array<string>) |
|             outputColumnNames: _col0               |
|             Lateral View Forward                   |
|               Select Operator                      |
|                 expressions: _col0 (type: array<string>) |
|                 outputColumnNames: _col0           |
|                 Lateral View Join Operator         |
|                   outputColumnNames: _col0, _col1  |
|                   Lateral View Forward             |
|                     Select Operator                |
|                       expressions: _col1 (type: string) |
|                       outputColumnNames: _col1     |
|                       Lateral View Join Operator   |
|                         outputColumnNames: _col1, _col2 |
|                         Select Operator            |
|                           expressions: _col1 (type: string), _col2 (type: 
string) |
|                           outputColumnNames: _col0, _col1 |
|                           ListSink                 |
|                     Select Operator                |
|                       expressions: _col0 (type: array<string>) |
|                       outputColumnNames: _col0     |
|                       UDTF Operator                |
|                         function name: explode     |
|                         Lateral View Join Operator |
|                           outputColumnNames: _col1, _col2 |
|                           Select Operator          |
|                             expressions: _col1 (type: string), _col2 (type: 
string) |
|                             outputColumnNames: _col0, _col1 |
|                             ListSink               |
|               Select Operator                      |
|                 expressions: Const array<string> [a, b] (type: array<string>) 
|
|                 outputColumnNames: _col0           |
|                 UDTF Operator                      |
|                   function name: explode           |
|                   Lateral View Join Operator       |
|                     outputColumnNames: _col0, _col1 |
|                     Lateral View Forward           |
|                       Select Operator              |
|                         expressions: _col1 (type: string) |
|                         outputColumnNames: _col1   |
|                         Lateral View Join Operator |
|                           outputColumnNames: _col1, _col2 |
|                           Select Operator          |
|                             expressions: _col1 (type: string), _col2 (type: 
string) |
|                             outputColumnNames: _col0, _col1 |
|                             ListSink               |
|                       Select Operator              |
|                         expressions: _col0 (type: array<string>) |
|                         outputColumnNames: _col0   |
|                         UDTF Operator              |
|                           function name: explode   |
|                           Lateral View Join Operator |
|                             outputColumnNames: _col1, _col2 |
|                             Select Operator        |
|                               expressions: _col1 (type: string), _col2 (type: 
string) |
|                               outputColumnNames: _col0, _col1 |
|                               ListSink             |
|                                                    |
+----------------------------------------------------+
71 rows selected (0.265 seconds)
0: jdbc:hive2://localhost:10000>   SET hive.cbo.enable=false;
No rows affected (0.005 seconds)
0: jdbc:hive2://localhost:10000>   SET hive.cbo.enable;
+------------------------+
|          set           |
+------------------------+
| hive.cbo.enable=false  |
+------------------------+
1 row selected (0.007 seconds)
0: jdbc:hive2://localhost:10000> EXPLAIN EXTENDED SELECT first_val, second_val
. . . . . . . . . . . . . . . .> FROM (SELECT array('a', 'b') as val_array) 
inline_data
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv1 AS 
first_val
. . . . . . . . . . . . . . . .> LATERAL VIEW explode(val_array) lv2 AS 
second_val
. . . . . . . . . . . . . . . .> WHERE first_val != second_val;
INFO  : Compiling 
command(queryId=root_20250715220220_a4c8999e-6aa2-4546-8aa0-d5c5128a5d80): 
EXPLAIN EXTENDED SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=root_20250715220220_a4c8999e-6aa2-4546-8aa0-d5c5128a5d80); Time 
taken: 0.095 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=root_20250715220220_a4c8999e-6aa2-4546-8aa0-d5c5128a5d80): 
EXPLAIN EXTENDED SELECT first_val, second_val
FROM (SELECT array('a', 'b') as val_array) inline_data
LATERAL VIEW explode(val_array) lv1 AS first_val
LATERAL VIEW explode(val_array) lv2 AS second_val
WHERE first_val != second_val
INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=root_20250715220220_a4c8999e-6aa2-4546-8aa0-d5c5128a5d80); Time 
taken: 0.034 seconds
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage                          |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         TableScan                                  |
|           alias: _dummy_table                      |
|           Row Limit Per Split: 1                   |
|           GatherStats: false                       |
|           Select Operator                          |
|             expressions: array('a','b') (type: array<string>) |
|             outputColumnNames: _col0               |
|             Lateral View Forward                   |
|               Select Operator                      |
|                 expressions: _col0 (type: array<string>) |
|                 outputColumnNames: _col0           |
|                 Lateral View Join Operator         |
|                   outputColumnNames: _col0, _col1  |
|                   Lateral View Forward             |
|                     Select Operator                |
|                       expressions: _col1 (type: string) |
|                       outputColumnNames: _col1     |
|                       Lateral View Join Operator   |
|                         outputColumnNames: _col1, _col2 |
|                         Filter Operator            |
|                           isSamplingPred: false    |
|                           predicate: (_col1 <> _col2) (type: boolean) |
|                           Select Operator          |
|                             expressions: _col1 (type: string), _col2 (type: 
string) |
|                             outputColumnNames: _col0, _col1 |
|                             ListSink               |
|                     Select Operator                |
|                       expressions: _col0 (type: array<string>) |
|                       outputColumnNames: _col0     |
|                       UDTF Operator                |
|                         function name: explode     |
|                         Lateral View Join Operator |
|                           outputColumnNames: _col1, _col2 |
|                           Filter Operator          |
|                             isSamplingPred: false  |
|                             predicate: (_col1 <> _col2) (type: boolean) |
|                             Select Operator        |
|                               expressions: _col1 (type: string), _col2 (type: 
string) |
|                               outputColumnNames: _col0, _col1 |
|                               ListSink             |
|               Select Operator                      |
|                 expressions: Const array<string> [a, b] (type: array<string>) 
|
|                 outputColumnNames: _col0           |
|                 UDTF Operator                      |
|                   function name: explode           |
|                   Lateral View Join Operator       |
|                     outputColumnNames: _col0, _col1 |
|                     Lateral View Forward           |
|                       Select Operator              |
|                         expressions: _col1 (type: string) |
|                         outputColumnNames: _col1   |
|                         Lateral View Join Operator |
|                           outputColumnNames: _col1, _col2 |
|                           Filter Operator          |
|                             isSamplingPred: false  |
|                             predicate: (_col1 <> _col2) (type: boolean) |
|                             Select Operator        |
|                               expressions: _col1 (type: string), _col2 (type: 
string) |
|                               outputColumnNames: _col0, _col1 |
|                               ListSink             |
|                       Select Operator              |
|                         expressions: _col0 (type: array<string>) |
|                         outputColumnNames: _col0   |
|                         UDTF Operator              |
|                           function name: explode   |
|                           Lateral View Join Operator |
|                             outputColumnNames: _col1, _col2 |
|                             Filter Operator        |
|                               isSamplingPred: false |
|                               predicate: (_col1 <> _col2) (type: boolean) |
|                               Select Operator      |
|                                 expressions: _col1 (type: string), _col2 
(type: string) |
|                                 outputColumnNames: _col0, _col1 |
|                                 ListSink           |
|                                                    |
+----------------------------------------------------+
83 rows selected (0.145 seconds){code}

> CBO returns incorrect results when WHERE clause filters multiple LATERAL VIEW 
> explode cross-products
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29084
>                 URL: https://issues.apache.org/jira/browse/HIVE-29084
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.1.0, 4.0.1
>         Environment:   - Affected Versions: Hive 4.0.1, 4.2.0-SNAPSHOT 
> (likely all CBO+lateral view versions)
>   - Execution Engine: MapReduce (local mode)
>   - Metastore: Derby embedded
>            Reporter: Konstantin Bereznyakov
>            Assignee: Konstantin Bereznyakov
>            Priority: Major
>         Attachments: hive_cbo_lateral_view_where_bug.log, 
> hive_cbo_lateral_view_where_bug.sql
>
>
> When CBO is enabled, queries with multiple LATERAL VIEW explode operations 
> and WHERE clause filtering return incorrect results. The WHERE clause 
> selectivity is not properly applied during optimization, causing the query to 
> return rows that should be filtered out.
> I can reliably reproduce the bug with 2+ occurrences of "LATERAL VIEW 
> explode()" *and* a WHERE clause; it does not seem to happen with only one 
> "LATERAL VIEW explode()" or without a WHERE clause.
> Originally reproduced with published 4.0.1 container, then verified with 
> locally built current master branch.
> I was *not* able to reproduce it with unit tests.
> {{*Minimal Reproduction Case:*}}
> {{  }}
> {{  DROP TABLE IF EXISTS cbo_lateral_view_where_bug;}}
> {{  CREATE TABLE cbo_lateral_view_where_bug (values array<string>);}}
> {{  INSERT INTO cbo_lateral_view_where_bug SELECT array('a', 'b');}}
> {{  ANALYZE TABLE cbo_lateral_view_where_bug COMPUTE STATISTICS;}}{{  **  }}
> {{  *-- Correct result (CBO OFF)*}}
> {{  SET hive.cbo.enable=false;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{  WHERE first_val != second_val;}}{{  *-- Incorrect result (CBO ON)*}}
> {{  SET hive.cbo.enable=true;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{  WHERE first_val != second_val;}}{{  **  }}
> {{  *–- Expected vs Actual Results:*}}
> {{  - CBO OFF (correct): 2 rows: ('a','b') and ('b','a')}}
> {{  - CBO ON (incorrect): 4 rows: ('a','b'), ('b','a'), ('a','a'), ('b','b')}}
>   *Root Cause:*
>   CBO fails to properly estimate the selectivity of WHERE clauses that filter 
> cross-products created by multiple lateral views on the same array. The 
> optimizer appears to ignore or miscalculate the filtering effect, allowing 
> rows that should be excluded.
>   *Impact:*
>   - Data Correctness: Queries return wrong results, not just performance 
> issues
>   - Scope: Requires 2+ LATERAL VIEW explode operations with WHERE filtering
>   - Severity: Silent data corruption - users may not notice incorrect results
>   {*}Workaround{*}:
> Set {*}hive.cbo.enable{*}=false for queries with multiple lateral view 
> cross-products and WHERE clause filtering.
>  
> Attachments: 
> - [^hive_cbo_lateral_view_where_bug.sql] - SQL to reproduce
> - Hive log: [^hive_cbo_lateral_view_where_bug.log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to