[GitHub] [hudi] sathyaprakashg opened a new issue #2212: Hive incompatible change issue when using hive sync

GitBox Wed, 28 Oct 2020 09:18:13 -0700


sathyaprakashg opened a new issue #2212:
URL: https://github.com/apache/hudi/issues/2212



   **Describe the problem you faced**
   
   We are getting this error in hive sync `FAILED: Execution Error, return code 
1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The 
following columns have types incompatible with the existing columns in their 
respective positions`. We tried setting `set 
hive.metastore.disallow.incompatible.col.type.changes=false;`  as specified 
[here](https://cwiki.apache.org/confluence/display/HUDI/Troubleshooting+Guide#TroubleshootingGuide-3.1Causedby:java.sql.SQLException:Errorwhileprocessingstatement:FAILED:ExecutionError,returncode1fromorg.apache.hadoop.hive.ql.exec.DDLTask.Unabletoaltertable.Thefollowingcolumnshavetypesincompatiblewiththeexistingcolumnsintheirrespec)
 but still getting same error 
   
   **To Reproduce**
   
   I manually ran below hive statement in our emr cluster (emr-5.29.0, Hive 
2.3.6) to simulate incompatible scenario and setting the above config did not 
work when making incompatible changes.
   
   > hive> set hive.metastore.disallow.incompatible.col.type.changes=false;
   > hive> set hive.metastore.disallow.incompatible.col.type.changes;
   > hive.metastore.disallow.incompatible.col.type.changes=false
   > hive> create table incompatible_change_1 (amount string);
   > OK
   > Time taken: 0.848 seconds
   > hive> alter  table incompatible_change_1 replace columns(amount int);
   > FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions :
   > amount
   > hive> create table incompatible_change_2 (col1 struct<a:string,b:string>);
   > OK
   > Time taken: 0.074 seconds
   > hive> alter  table incompatible_change_2 replace columns(col1 
struct<a:string,b:string,c:string>);
   > FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions :
   > col1
   
   **Expected behavior**
   
   When setting `set 
hive.metastore.disallow.incompatible.col.type.changes=false;` hive should allow 
incompatible changes, because we expect additional fields to be added to struct 
type as schema evolves and hudi should be able to handle it without error
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.4
   
   * Hive version : 2.3.6
   
   * Hadoop version : Amazon 2.8.5
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : No
   
   * EMR Version EMR-5.29.0
   
   
   
   **Stacktrace**
   
   ```
   20/10/28 16:12:57 ERROR HiveSyncTool: Got runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER 
TABLE table1 REPLACE COLUMNS(......)
   at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:352)
        at 
org.apache.hudi.hive.HoodieHiveClient.updateTableDefinition(HoodieHiveClient.java:250)
        at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:183)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   
   Caused by: java.sql.SQLException: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions :
   col1,col2,col3
        at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:363)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] sathyaprakashg opened a new issue #2212: Hive incompatible change issue when using hive sync

Reply via email to