[I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

via GitHub Thu, 07 Dec 2023 07:55:34 -0800


MikeMccree opened a new issue, #10273:
URL: https://github.com/apache/hudi/issues/10273


   **Describe the problem you faced**
   
   Attempting to upgrade my hudi version and emr cluster, but running into sync 
issues / tabletype issues and partition sync issues.
   Seems the DB gets created and the tables are created, but there is no data 
inserted into these tables.
   
   None of the above experienced in my previous emr cluster.
   
   **To Reproduce**
   
   Steps to reproduce the behavior (For me):
   
   1. Spin up new EMR cluster (emr-6.15.0 or 6.14.0)
   2. Along with this, comes hudi 0.14 as seen in usr/lib/hudi
   3. Try run my exact script which was working on emr-6.10 with hudi 0.12.2
   4. I deleted destination s3 objects, glue db and glue tables
   
   **Expected behavior**
   I would assume my expectations would be to have the same script run 
successfully and insert data.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.1
   
   * Hive version : 3.1.1
   
   * Hadoop version : 3.3.6
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   config I am using:
   
   `hudi_streaming_count_options = {
       'hoodie.database.name': database_name,
       'hoodie.table.name': loggingtablename,
       'hoodie.datasource.hive_sync.database': database_name,
       'hoodie.datasource.hive_sync.table': loggingtablename,
       'hoodie.datasource.hive_sync.create_managed_table': 'true',
       'hoodie.datasource.hive_sync.enable': 'true',
       'hoodie.datasource.hive_sync.mode': 'hms',
       'hoodie.datasource.hive_sync.support_timestamp': 'true',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.datasource.write.precombine.field': 'start_time',
       'hoodie.datasource.write.partitionpath.field': "table_name, batch_id",
       'hoodie.datasource.hive_sync.partition_fields': "table_name, batch_id",
       'hoodie.datasource.write.table.type' : 'COPY_ON_WRITE',
       'hoodie.datasource.write.recordkey.field': "batch_id",
       'hoodie.datasource.write.operation': 'upsert', 
       "hoodie.write.num.retries.on.conflict.failures": "15",
       }`
   
   **Stacktrace**
   
   `An error occurred while calling o344.save.
   : org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using 
the meta sync class org.apache.hudi.hive.HiveSyncTool
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:993)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:991)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1089)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82)
        at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.sendCommand(ClientServerConnection.java:244)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:384)
        at py4j.CallbackClient.sendCommand(CallbackClient.java:356)
        at 
py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106)
        at com.sun.proxy.$Proxy70.call(Unknown Source)
        at 
org.apache.spark.sql.execution.streaming.sources.PythonForeachBatchHelper$.$anonfun$callForeachBatch$1(ForeachBatchSink.scala:57)
        at 
org.apache.spark.sql.execution.streaming.sources.PythonForeachBatchHelper$.$anonfun$callForeachBatch$1$adapted(ForeachBatchSink.scala:57)
        at 
org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:38)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:729)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:726)
        at 
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411)
        at 
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:726)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:284)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411)
        at 
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:247)
        at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67)
        at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:237)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:306)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:284)
        at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:207)
   Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing logging_table
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:168)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
        ... 89 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table logging_table
        at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:415)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:280)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:177)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:165)
        ... 90 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: 
michael_streaming_test.logging_table add partition failed
        at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217)
        at 
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:115)
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:457)
        at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:411)
        ... 93 more
   Caused by: InvalidObjectException(message:TableType cannot be null for 
table: logging_table (Service: null; Status Code: 0; Error Code: null; Request 
ID: null; Proxy: null))
        at 
com.amazonaws.glue.catalog.converters.BaseCatalogToHiveConverter$5.get(BaseCatalogToHiveConverter.java:70)
        at 
com.amazonaws.glue.catalog.converters.BaseCatalogToHiveConverter.getHiveException(BaseCatalogToHiveConverter.java:109)
        at 
com.amazonaws.glue.catalog.converters.BaseCatalogToHiveConverter.wrapInHiveException(BaseCatalogToHiveConverter.java:100)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getTable(GlueMetastoreClientDelegate.java:450)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.getTable(AWSCatalogMetastoreClient.java:1008)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2350)
        at com.sun.proxy.$Proxy94.getTable(Unknown Source)
        at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:197)
        ... 96 more`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

Reply via email to