bytesid19 opened a new issue, #11955: URL: https://github.com/apache/hudi/issues/11955
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** We recently migrated our hudi version from 0.10.1 to 0.15.0 in our spark jobs. Hudi properties at that time ``` 'hoodie.table.name': 'table_name', 'hoodie.datasource.write.table.name':'table_name', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.datasource.write.recordkey.field': hudi_record_key, 'hoodie.datasource.write.partitionpath.field': "txn_date", 'hoodie.datasource.write.precombine.field': 'ingestion_timestamp', 'hoodie.upsert.shuffle.parallelism': 100, 'hoodie.insert.shuffle.parallelism': 100 ``` During that time hive meta sync was not enabled. Now we had the requirement to sync data to hive via hms mode, for which we added following properties to above ``` 'hoodie.datasource.meta.sync.enable': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.database.name': '##db_name##', 'hoodie.datasource.hive_sync.metastore.uris': '##hive_uri##' ``` When we triggered a new spark job after that to write new data to the same hudi table, the job ran successfully and added the new data to hudi table but did not update hudi properties. But now whenever we are triggering the spark job to write new data, job is failing and we are getting null pointer exception at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.util.HashMap.forEach(HashMap.java:1290) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) **To Reproduce** Steps to reproduce the behavior: 1. Migrate hudi version from 0.10.1 to 0.15.0 and start writing to hudi table w/o hive sync. 2. Add hive meta sync related properties to hudi properties. 3. Write to hudi table via spark jobs. **Expected behavior** * The job should complete successfully. * Hudi table to be updated with new data. * Hudi properties to include `hoodie.database.name`. * Metadata to be synced to hive successfully. **Environment Description** * Hudi version : 0.15.0 * Spark version : 3.3.2 * Hive version : 3.1.2 * Hadoop version : 3.0.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```An error occurred while calling o1417.save. : org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49) at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:82) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:79) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:256) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:165) at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:256) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$9(SQLExecution.scala:258) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:448) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:203) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1073) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:131) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:398) at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:255) at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:238) at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:251) at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:339) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:335) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495) at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:244) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:395) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:244) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:198) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:189) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:305) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:964) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:429) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:396) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195) at py4j.ClientServerConnection.run(ClientServerConnection.java:115) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing LH820YOhaC01kt at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 58 more Caused by: java.lang.NullPointerException at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114) at java.util.HashMap.forEach(HashMap.java:1290) at org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110) at org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167) ... 59 more``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
