[I] [SUPPORT] Hive meta sync null pointer issue [hudi]

via GitHub Wed, 18 Sep 2024 03:10:12 -0700


bytesid19 opened a new issue, #11955:
URL: https://github.com/apache/hudi/issues/11955


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We recently migrated our hudi version from 0.10.1 to 0.15.0 in our spark 
jobs. 
   Hudi properties at that time
   ```
   'hoodie.table.name': 'table_name',
   'hoodie.datasource.write.table.name':'table_name',
   'hoodie.datasource.write.operation': 'upsert',
   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator',
   'hoodie.datasource.write.recordkey.field': hudi_record_key,
   'hoodie.datasource.write.partitionpath.field': "txn_date",
   'hoodie.datasource.write.precombine.field': 'ingestion_timestamp',
   'hoodie.upsert.shuffle.parallelism': 100,
   'hoodie.insert.shuffle.parallelism': 100
   ```
   During that time hive meta sync was not enabled.
   
   Now we had the requirement to sync data to hive via hms mode, for which we 
added following properties to above
   ```
   'hoodie.datasource.meta.sync.enable': 'true',
   'hoodie.datasource.hive_sync.mode': 'hms',
   'hoodie.database.name': '##db_name##',
   'hoodie.datasource.hive_sync.metastore.uris': '##hive_uri##'
   ```
   When we triggered a new spark job after that to write new data to the same 
hudi table, the job ran successfully and added the new data to hudi table but 
did not update hudi properties.
   But now whenever we are triggering the spark job to write new data, job is 
failing and we are getting null pointer exception at 
org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114)
        at java.util.HashMap.forEach(HashMap.java:1290)
        at 
org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113)
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Migrate hudi version from 0.10.1 to 0.15.0 and start writing to hudi 
table w/o hive sync.
   2. Add hive meta sync related properties to hudi properties.
   3. Write to hudi table via spark jobs.
   
   **Expected behavior**
   
   * The job should complete successfully.
   * Hudi table to be updated with new data.
   * Hudi properties to include `hoodie.database.name`.
   * Metadata to be synced to hive successfully.
   
   **Environment Description**
   
   * Hudi version :
   0.15.0
   
   * Spark version :
   3.3.2
   
   * Hive version :
   3.1.2
   
   * Hadoop version :
   3.0.0
   
   * Storage (HDFS/S3/GCS..) :
   S3
   
   * Running on Docker? (yes/no) :
   no
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```An error occurred while calling o1417.save.
   : org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using 
the meta sync class org.apache.hudi.hive.HiveSyncTool
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1015)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1013)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1112)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508)
        at 
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:82)
        at 
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:79)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:256)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:165)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:256)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$9(SQLExecution.scala:258)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:448)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:203)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1073)
        at 
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:131)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:398)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:255)
        at 
org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:238)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:251)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:244)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:339)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:335)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:244)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:395)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:244)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:198)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:189)
        at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:305)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:964)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:429)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:396)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:250)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
        at py4j.Gateway.invoke(Gateway.java:306)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing LH820YOhaC01kt
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
        ... 58 more
   Caused by: java.lang.NullPointerException
        at 
org.apache.hudi.common.table.timeline.TimelineUtils.lambda$null$5(TimelineUtils.java:114)
        at java.util.HashMap.forEach(HashMap.java:1290)
        at 
org.apache.hudi.common.table.timeline.TimelineUtils.lambda$getDroppedPartitions$6(TimelineUtils.java:113)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
        at 
org.apache.hudi.common.table.timeline.TimelineUtils.getDroppedPartitions(TimelineUtils.java:110)
        at 
org.apache.hudi.sync.common.HoodieSyncClient.getDroppedPartitionsSince(HoodieSyncClient.java:97)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:289)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:179)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167)
        ... 59 more```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] Hive meta sync null pointer issue [hudi]

Reply via email to