GurRonenExplorium opened a new issue #1856: URL: https://github.com/apache/hudi/issues/1856
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Hey, tl;dr: Hive Sync is failing on `alter table cascade` I am running a PoC with Hudi and started working with a timeseries dataset we have, input is partitioned by insertion_time with late data being maximum 48hr. output is the same dataset, with event_time partitions and some additional fields (all of them are row-by-row with no aggregations) Setup: AWS EMR, setting up transient clusters (spark for the job itself, hive for access to glue metastore for the HiveSync tool - btw if there is a better way I'm happy to hear) Steps i did: 1. load 1 day of data (worked well) 2. loaded a few extra days with 1 partition batches each time (so each run was a single insertion time partition) everything synced well to 3. run on a full month of data in a single job 4. Successfully load data to hudi, HiveSync failed with alter table error A clear and concise description of the problem. **Expected behavior** Hive Sync shouldn't crash when syncing to glue catalog **Environment Description** * Hudi version : 0.5.3 * Spark version : 2.4.5 * Hive version : 2.3.6 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no EMR: 5.30.1 **Stacktrace** stacktrace is a bit redacted, if anything more is needed i can get it ``` 20/07/19 19:27:47 ERROR HiveSyncTool: Got runtime exception when hive syncing org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE `#DB_NAME#`.`#TABLE_NAME#` REPLACE COLUMNS(`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `utc_timestamp` string, `local_timestamp_with_timezone` string, `utc_timestamp_with_timezone` string, `#COL1#` string, `#COL2#` string, `#COL3#` double, `#COL4#` double, `#COL5#` string, `#COL6#` string, `#COL7#` double, `#COL8#` double, `#COL9#` string, `#COL10#` bigint, `#COL11#` string, `#COL12#` string, `#COL13#` string, `#COL14#` string, `#COL15#` string, `#COL16#` string, `#COL17#` string, `#COL18#` int, `hash_id` string, `#REDACTED#_6` string, `#REDACTED#_7` string, `#REDACTED#_8` string, `#REDACTED#_9` string, `#REDACTED#_10` string, `#REDACTED#_11` string, `offset_year` int, `offset_month` int, `offset_dayofmonth` int, `offset_dayofweek` int, `offset_hourofday` int ) cascade at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:482) at org.apache.hudi.hive.HoodieHiveClient.updateTableDefinition(HoodieHiveClient.java:261) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:164) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:114) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:229) at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:279) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:184) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at ai.explorium.reveal.RevealS3IngestorApp$.main(RevealS3IngestorApp.scala:89) at ai.explorium.reveal.RevealS3IngestorApp.main(RevealS3IngestorApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cascade for alter_table is not supported at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257) at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:363) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: Cascade for alter_table is not supported at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:509) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table_with_environmentContext(AWSCatalogMetastoreClient.java:438) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336) at com.sun.proxy.$Proxy42.alter_table_with_environmentContext(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:628) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3590) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:390) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255) ... 11 more at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:297) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:480) ... 47 more 20/07/19 19:27:47 INFO SparkContext: Invoking stop() from shutdown hook 20/07/19 19:27:47 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-64-38.eu-west-1.compute.internal:4040 20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Interrupting monitor thread 20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Shutting down all executors 20/07/19 19:27:47 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/07/19 19:27:47 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Stopped 20/07/19 19:27:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/07/19 19:27:47 INFO MemoryStore: MemoryStore cleared 20/07/19 19:27:47 INFO BlockManager: BlockManager stopped 20/07/19 19:27:47 INFO BlockManagerMaster: BlockManagerMaster stopped 20/07/19 19:27:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/07/19 19:27:47 INFO SparkContext: Successfully stopped SparkContext 20/07/19 19:27:47 INFO ShutdownHookManager: Shutdown hook called 20/07/19 19:27:47 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7ba98d71-ce9e-4f47-838d-02093ea288fc 20/07/19 19:27:47 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-bc6a9489-a6d0-47c9-a30b-04c538bf519e Command exiting with ret '0' ``` Thanks for this project! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
