Re: [PR] HIVE-27731: Iceberg: Perform metadata delete for queries with static filters [hive]

via GitHub Tue, 17 Oct 2023 00:39:25 -0700


deniskuzZ commented on code in PR #4748:
URL: https://github.com/apache/hive/pull/4748#discussion_r1361640112



##########
iceberg/iceberg-handler/src/test/results/positive/delete_iceberg_copy_on_write_partitioned.q.out:
##########
@@ -29,196 +29,13 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: default@tbl_ice
 POSTHOOK: Output: default@tbl_ice
 STAGE DEPENDENCIES:
-  Stage-1 is a root stage
-  Stage-2 depends on stages: Stage-1
-  Stage-0 depends on stages: Stage-2
-  Stage-3 depends on stages: Stage-0
+  Stage-4 is a root stage
 
 STAGE PLANS:
-  Stage: Stage-1
-    Tez
-#### A masked pattern was here ####
-      Edges:
-        Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE), Union 3 
(CONTAINS)
-        Reducer 4 <- Map 1 (SIMPLE_EDGE)
-        Reducer 6 <- Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
-#### A masked pattern was here ####
-      Vertices:
-        Map 1 
-            Map Operator Tree:
-                TableScan
-                  alias: tbl_ice
-                  filterExpr: (((a <> 22) and (b <> 'one') and (b <> 'four')) 
or (b) IN ('one', 'four') or (a = 22)) (type: boolean)
-                  Statistics: Num rows: 7 Data size: 672 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  Filter Operator
-                    predicate: ((a <> 22) and (b <> 'one') and (b <> 'four') 
and FILE__PATH is not null) (type: boolean)
-                    Statistics: Num rows: 7 Data size: 672 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    Select Operator
-                      expressions: a (type: int), b (type: string), c (type: 
int), PARTITION__SPEC__ID (type: int), PARTITION__HASH (type: bigint), 
FILE__PATH (type: string), ROW__POSITION (type: bigint)
-                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6
-                      Statistics: Num rows: 7 Data size: 2100 Basic stats: 
COMPLETE Column stats: COMPLETE
-                      Reduce Output Operator
-                        key expressions: _col5 (type: string)
-                        null sort order: z
-                        sort order: +
-                        Map-reduce partition columns: _col5 (type: string)
-                        Statistics: Num rows: 7 Data size: 2100 Basic stats: 
COMPLETE Column stats: COMPLETE
-                        value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: int), _col3 (type: int), _col4 (type: bigint), _col6 
(type: bigint)
-                  Filter Operator
-                    predicate: (((b) IN ('one', 'four') or (a = 22)) and 
FILE__PATH is not null) (type: boolean)
-                    Statistics: Num rows: 4 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    Reduce Output Operator
-                      key expressions: FILE__PATH (type: string)
-                      null sort order: a
-                      sort order: +
-                      Map-reduce partition columns: FILE__PATH (type: string)
-                      Statistics: Num rows: 4 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-            Execution mode: vectorized
-        Map 5 
-            Map Operator Tree:
-                TableScan
-                  alias: tbl_ice
-                  filterExpr: ((a = 22) or (b) IN ('one', 'four')) (type: 
boolean)
-                  Statistics: Num rows: 7 Data size: 672 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  Filter Operator
-                    predicate: ((a = 22) or (b) IN ('one', 'four')) (type: 
boolean)
-                    Statistics: Num rows: 4 Data size: 384 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    Reduce Output Operator
-                      key expressions: FILE__PATH (type: string)
-                      null sort order: a
-                      sort order: +
-                      Map-reduce partition columns: FILE__PATH (type: string)
-                      Statistics: Num rows: 4 Data size: 384 Basic stats: 
COMPLETE Column stats: COMPLETE
-                      value expressions: a (type: int), b (type: string), c 
(type: int), PARTITION__SPEC__ID (type: int), PARTITION__HASH (type: bigint)
-            Execution mode: vectorized
-        Reducer 2 
-            Reduce Operator Tree:
-              Merge Join Operator
-                condition map:
-                     Left Semi Join 0 to 1
-                keys:
-                  0 _col5 (type: string)
-                  1 _col0 (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6
-                Statistics: Num rows: 2 Data size: 600 Basic stats: COMPLETE 
Column stats: COMPLETE
-                Select Operator
-                  expressions: _col3 (type: int), _col4 (type: bigint), _col5 
(type: string), _col6 (type: bigint), _col0 (type: int), _col1 (type: string), 
_col2 (type: int)
-                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6
-                  Statistics: Num rows: 2 Data size: 600 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  File Output Operator
-                    compressed: false
-                    Statistics: Num rows: 4 Data size: 1200 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    table:
-                        input format: 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat
-                        output format: 
org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
-                        serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe
-                        name: default.tbl_ice
-        Reducer 4 
-            Execution mode: vectorized
-            Reduce Operator Tree:
-              Select Operator
-                expressions: KEY.reducesinkkey0 (type: string)
-                outputColumnNames: _col5
-                Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE 
Column stats: COMPLETE
-                PTF Operator
-                  Function definitions:
-                      Input definition
-                        input alias: ptf_0
-                        output shape: _col5: string
-                        type: WINDOWING
-                      Windowing table definition
-                        input alias: ptf_1
-                        name: windowingtablefunction
-                        order by: _col5 ASC NULLS FIRST
-                        partition by: _col5
-                        raw input shape:
-                        window functions:
-                            window function definition
-                              alias: row_number_window_0
-                              name: row_number
-                              window function: GenericUDAFRowNumberEvaluator
-                              window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
-                              isPivotResult: true
-                  Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  Filter Operator
-                    predicate: (row_number_window_0 = 1) (type: boolean)
-                    Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    Select Operator
-                      expressions: _col5 (type: string)
-                      outputColumnNames: _col0
-                      Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-                      Group By Operator
-                        keys: _col0 (type: string)
-                        minReductionHashAggr: 0.4
-                        mode: hash
-                        outputColumnNames: _col0
-                        Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-                        Reduce Output Operator
-                          key expressions: _col0 (type: string)
-                          null sort order: z
-                          sort order: +
-                          Map-reduce partition columns: _col0 (type: string)
-                          Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: COMPLETE
-        Reducer 6 
-            Execution mode: vectorized
-            Reduce Operator Tree:
-              Select Operator
-                expressions: VALUE._col0 (type: int), VALUE._col1 (type: 
string), VALUE._col2 (type: int), VALUE._col3 (type: int), VALUE._col4 (type: 
bigint), KEY.reducesinkkey0 (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
-                Statistics: Num rows: 4 Data size: 1168 Basic stats: COMPLETE 
Column stats: COMPLETE
-                PTF Operator
-                  Function definitions:
-                      Input definition
-                        input alias: ptf_0
-                        type: WINDOWING
-                      Windowing table definition
-                        input alias: ptf_1
-                        name: windowingtablefunction
-                        order by: _col5 ASC NULLS FIRST
-                        partition by: _col5
-                        raw input shape:
-                        window functions:
-                            window function definition
-                              alias: row_number_window_0
-                              name: row_number
-                              window function: GenericUDAFRowNumberEvaluator
-                              window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
-                              isPivotResult: true
-                  Statistics: Num rows: 4 Data size: 1168 Basic stats: 
COMPLETE Column stats: COMPLETE
-                  Filter Operator
-                    predicate: (row_number_window_0 = 1) (type: boolean)
-                    Statistics: Num rows: 2 Data size: 584 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    Select Operator
-                      expressions: _col3 (type: int), _col4 (type: bigint), 
_col5 (type: string), -1L (type: bigint), _col0 (type: int), _col1 (type: 
string), _col2 (type: int)
-                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6
-                      Statistics: Num rows: 2 Data size: 600 Basic stats: 
COMPLETE Column stats: COMPLETE
-                      File Output Operator
-                        compressed: false
-                        Statistics: Num rows: 4 Data size: 1200 Basic stats: 
COMPLETE Column stats: COMPLETE
-                        table:
-                            input format: 
org.apache.iceberg.mr.hive.HiveIcebergInputFormat
-                            output format: 
org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
-                            serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe
-                            name: default.tbl_ice
-        Union 3 
-            Vertex: Union 3
-
-  Stage: Stage-2
-    Dependency Collection
-
-  Stage: Stage-0
-    Move Operator
-      tables:
-          replace: false
-          table:
-              input format: org.apache.iceberg.mr.hive.HiveIcebergInputFormat
-              output format: org.apache.iceberg.mr.hive.HiveIcebergOutputFormat
-              serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe
-              name: default.tbl_ice
-
-  Stage: Stage-3
-    Stats Work
-      Basic Stats Work:
+  Stage: Stage-4
+    Execute operation
+      table name: default.tbl_ice
+      spec: AlterTableExecuteSpec{operationType=DELETE_METADATA, 
operationParams=org.apache.hadoop.hive.ql.parse.AlterTableExecuteSpec$#Masked#

Review Comment:
   nice, question is, why didn't it affect the 
delete_iceberg_partitioned.q.out? do we kick in this optimization when 
split-update is on? SplitUpdateSemanticAnalyzer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27731: Iceberg: Perform metadata delete for queries with static filters [hive]

Reply via email to