[jira] [Work logged] (HIVE-26925) MV with iceberg storage format fails when contains 'PARTITIONED ON' clause due to column number/types difference.

ASF GitHub Bot (Jira) Wed, 18 Jan 2023 07:58:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26925?focusedWorklogId=840009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-840009
 ]


ASF GitHub Bot logged work on HIVE-26925:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jan/23 15:57
            Start Date: 18/Jan/23 15:57
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #3939:
URL: https://github.com/apache/hive/pull/3939#discussion_r1073656566


##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##########
@@ -958,6 +940,31 @@ public Table toTable(HiveConf conf) throws HiveException {
     return tbl;
   }
 
+  public static void setColumnsAndStorePartitionTransformSpec(

Review Comment:
   Consider moving the method in `DDLUtils` or `AlterTableUtils`.



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##########
@@ -958,6 +940,31 @@ public Table toTable(HiveConf conf) throws HiveException {
     return tbl;
   }
 
+  public static void setColumnsAndStorePartitionTransformSpec(
+          List<FieldSchema> columns, List<FieldSchema> partitionColumns,
+          HiveConf conf, Table tbl, HiveStorageHandler storageHandler)

Review Comment:
   `storageHandler` can be obtained from `tbl` so we could omit this argument.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -13720,15 +13720,7 @@ ASTNode analyzeCreateTable(
         }
         break;
       case HiveParser.TOK_TABLEPARTCOLSBYSPEC:
-        List<TransformSpec> partitionTransformSpec =
-            PartitionTransform.getPartitionTransformSpec(child);
-
-        if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC,
-            partitionTransformSpec)) {
-          throw new SemanticException("Query state attached to Session state 
must be not null. " +
-              "Partition transform metadata cannot be saved.");
-        }
-

Review Comment:
   Since the `if` block appears in at least three places consider refactoring 
it independently. It could be done as a more generic method in 
`SessionStateUtil`.
   
   ```java
   public static void addResourceOrThrow(Configuration conf, String key, Object 
resource) {
       Optional<QueryState> queryState = getQueryState(conf);
       if (queryState.isPresent()) {
         queryState.get().addResource(key, resource);
       } else {
         throw new IllegalStateException("Query state is missing; failed to add 
resource for " + key);
       }
     }
   ```
   Then its up to you if you want to introduce 
`parseAndStorePartitionTransformSpec` or not.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -14190,11 +14200,17 @@ protected ASTNode analyzeCreateView(ASTNode ast, QB 
qb, PlannerContext plannerCt
               storageFormat.getSerdeProps());
         }
         break;
+      case HiveParser.TOK_TABLEPARTCOLSBYSPEC:
+        parseAndStorePartitionTransformSpec(child);
+        partitionTransformSpecExists = true;
+        break;
       default:
         assert false;
       }
     }
 
+    validateStorageFormat(storageFormat, tblProps, 
partitionTransformSpecExists);

Review Comment:
   In order to test completely this code path it may be useful to add some 
negative tests with views and unsupported partition by clauses.



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##########
@@ -958,6 +940,31 @@ public Table toTable(HiveConf conf) throws HiveException {
     return tbl;
   }
 
+  public static void setColumnsAndStorePartitionTransformSpec(

Review Comment:
   Consider including `table` somewhere in the method name to better reflect 
its purpose.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 840009)
    Time Spent: 2h  (was: 1h 50m)

> MV with iceberg storage format fails when contains 'PARTITIONED ON' clause 
> due to column number/types difference.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26925
>                 URL: https://issues.apache.org/jira/browse/HIVE-26925
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>            Reporter: Dharmik Thakkar
>            Assignee: Krisztian Kasa
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> MV with iceberg storage format fails when contains 'PARTITIONED ON' clause 
> due to column number/types difference.
> {code:java}
> !!! annotations iceberg
> >>> use iceberg_test_db_hive;
> No rows affected
> >>> set hive.exec.max.dynamic.partitions=2000;
> >>> set hive.exec.max.dynamic.partitions.pernode=2000;
> >>> drop materialized view if exists mv_agg_gby_col_partitioned;
> >>> create materialized view mv_agg_gby_col_partitioned PARTITIONED ON (t) 
> >>> stored by iceberg stored as orc tblproperties ('format-version'='1') as 
> >>> select b,f,sum(b), sum(f),t from all100k group by b,f,v,c,t;
> >>> analyze table mv_agg_gby_col_partitioned compute statistics for columns;
> >>> set hive.explain.user=false;
> >>> explain select b,f,sum(b) from all100k where t=93 group by c,v,f,b;
> !!! match row_contains
>           alias: iceberg_test_db_hive.mv_agg_gby_col_partitioned
> >>> drop materialized view mv_agg_gby_col_partitioned;
>  {code}
> Error
> {code:java}
> 2023-01-10T20:31:17,514 INFO  [pool-5-thread-1] jdbc.TestDriver: Query: 
> create materialized view mv_agg_gby_col_partitioned PARTITIONED ON (t) stored 
> by iceberg stored as orc tblproperties ('format-version'='1') as select 
> b,f,sum(b), sum(f),t from all100k group by b,f,v,c,t
> 2023-01-10T20:31:18,099 INFO  [Thread-21] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230110203117_6c333b6a-1642-40e7-80bc-e78dede47980): 
> create materialized view mv_agg_gby_col_partitioned PARTITIONED ON (t) stored 
> by iceberg stored as orc tblproperties ('format-version'='1') as select 
> b,f,sum(b), sum(f),t from all100k group by b,f,v,c,t
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver: INFO  : No Stats 
> for iceberg_test_db_hive@all100k, Columns: b, c, t, f, v
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver: ERROR : FAILED: 
> SemanticException Line 0:-1 Cannot insert into target table because column 
> number/types are different 'TOK_TMP_FILE': Table insclause-0 has 6 columns, 
> but query has 5 columns.
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver: 
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Cannot insert 
> into target table because column number/types are different 'TOK_TMP_FILE': 
> Table insclause-0 has 6 columns, but query has 5 columns.
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:8905)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:8114)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11583)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11455)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12424)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12290)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:13038)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:756)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13154)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:472)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:313)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:222)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:201)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:657)
> 2023-01-10T20:31:18,100 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:603)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:597)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:     at 
> java.base/java.lang.Thread.run(Thread.java:829)
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver:
> 2023-01-10T20:31:18,101 INFO  [Thread-21] jdbc.TestDriver: INFO  : Completed 
> compiling 
> command(queryId=hive_20230110203117_6c333b6a-1642-40e7-80bc-e78dede47980); 
> Time taken: 0.526 seconds
> 2023-01-10T20:31:18,306 ERROR [pool-5-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: SemanticException Line 0:-1 Cannot insert into 
> target table because column number/types are different 'TOK_TMP_FILE': Table 
> insclause-0 has 6 columns, but query has 5 columns. {code}
> Similar query works for Hive native materialized views.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26925) MV with iceberg storage format fails when contains 'PARTITIONED ON' clause due to column number/types difference.

Reply via email to