[GitHub] [hudi] GurRonenExplorium opened a new issue #1856: [SUPPORT] HiveSyncTool fails on alter table cascade

GitBox Tue, 21 Jul 2020 07:51:01 -0700


GurRonenExplorium opened a new issue #1856:
URL: https://github.com/apache/hudi/issues/1856



   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   Hey,
   tl;dr: Hive Sync is failing on `alter table cascade`
   
   I am running a PoC with Hudi and started working with a timeseries dataset 
we have, input is partitioned by insertion_time with late data being maximum 
48hr. output is the same dataset, with event_time partitions and some 
additional fields (all of them are row-by-row with no aggregations)
   
   Setup: AWS EMR, setting up transient clusters (spark for the job itself, 
hive for access to glue metastore for the HiveSync tool - btw if there is a 
better way I'm happy to hear)
   
   Steps i did:
   1. load 1 day of data (worked well)
   2. loaded a few extra days with 1 partition batches each time (so each run 
was a single insertion time partition) everything synced well to
   3. run on a full month of data in a single job
   4. Successfully load data to hudi, HiveSync failed with alter table error
   
   A clear and concise description of the problem.
   
   
   **Expected behavior**
   
   Hive Sync shouldn't crash when syncing to glue catalog
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   
   * Spark version : 2.4.5
   
   * Hive version : 2.3.6
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   EMR: 5.30.1
   
   **Stacktrace**
   stacktrace is a bit redacted, if anything more is needed i can get it
   ```
   20/07/19 19:27:47 ERROR HiveSyncTool: Got runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER 
TABLE `#DB_NAME#`.`#TABLE_NAME#` REPLACE COLUMNS(`_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `utc_timestamp` 
string, `local_timestamp_with_timezone` string, `utc_timestamp_with_timezone` 
string, `#COL1#` string, `#COL2#` string, `#COL3#` double, `#COL4#` double, 
`#COL5#` string, `#COL6#` string, `#COL7#` double, `#COL8#` double, `#COL9#` 
string, `#COL10#` bigint, `#COL11#` string, `#COL12#` string, `#COL13#` string, 
`#COL14#` string, `#COL15#` string, `#COL16#` string, `#COL17#` string, 
`#COL18#` int, `hash_id` string, `#REDACTED#_6` string, `#REDACTED#_7` string, 
`#REDACTED#_8` string, `#REDACTED#_9` string, `#REDACTED#_10` string, 
`#REDACTED#_11` string, `offset_year` int, `offset_month` int, 
`offset_dayofmonth` int, `offset_dayofweek` int, `offset_hourofday` int ) 
cascade
       at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:482)
       at 
org.apache.hudi.hive.HoodieHiveClient.updateTableDefinition(HoodieHiveClient.java:261)
       at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:164)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:114)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:229)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:279)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:184)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
       at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
       at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
       at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
       at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
       at 
org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
       at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
       at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
       at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at 
ai.explorium.reveal.RevealS3IngestorApp$.main(RevealS3IngestorApp.scala:89)
       at 
ai.explorium.reveal.RevealS3IngestorApp.main(RevealS3IngestorApp.scala)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
       at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
       at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
       at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.sql.SQLException: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Cascade for alter_table is not supported
       at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
       at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
       at 
org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
       at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
       at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:363)
       at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: Cascade for alter_table 
is not supported
       at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:509)
       at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table_with_environmentContext(AWSCatalogMetastoreClient.java:438)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
       at com.sun.proxy.$Proxy42.alter_table_with_environmentContext(Unknown 
Source)
       at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:628)
       at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3590)
       at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:390)
       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
       at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
       at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
       ... 11 more
   
       at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:297)
       at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:480)
       ... 47 more
   20/07/19 19:27:47 INFO SparkContext: Invoking stop() from shutdown hook
   20/07/19 19:27:47 INFO SparkUI: Stopped Spark web UI at 
http://ip-172-31-64-38.eu-west-1.compute.internal:4040
   20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Interrupting monitor 
thread
   20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Shutting down all 
executors
   20/07/19 19:27:47 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
executor to shut down
   20/07/19 19:27:47 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices
   (serviceOption=None,
    services=List(),
    started=false)
   20/07/19 19:27:47 INFO YarnClientSchedulerBackend: Stopped
   20/07/19 19:27:47 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   20/07/19 19:27:47 INFO MemoryStore: MemoryStore cleared
   20/07/19 19:27:47 INFO BlockManager: BlockManager stopped
   20/07/19 19:27:47 INFO BlockManagerMaster: BlockManagerMaster stopped
   20/07/19 19:27:47 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   20/07/19 19:27:47 INFO SparkContext: Successfully stopped SparkContext
   20/07/19 19:27:47 INFO ShutdownHookManager: Shutdown hook called
   20/07/19 19:27:47 INFO ShutdownHookManager: Deleting directory 
/mnt/tmp/spark-7ba98d71-ce9e-4f47-838d-02093ea288fc
   20/07/19 19:27:47 INFO ShutdownHookManager: Deleting directory 
/mnt/tmp/spark-bc6a9489-a6d0-47c9-a30b-04c538bf519e
   Command exiting with ret '0'
   ```
   Thanks for this project!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] GurRonenExplorium opened a new issue #1856: [SUPPORT] HiveSyncTool fails on alter table cascade

Reply via email to