[
https://issues.apache.org/jira/browse/HUDI-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-2089:
---------------------------------
Labels: pull-request-available (was: )
> fix the bug that metatable cannot support non_partition table
> -------------------------------------------------------------
>
> Key: HUDI-2089
> URL: https://issues.apache.org/jira/browse/HUDI-2089
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Affects Versions: 0.8.0
> Environment: spark3.1.1
> hive3.1.1
> hadoop 3.1.1
> Reporter: tao meng
> Assignee: tao meng
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> now, we found that when we enable metable for non_partition hudi table, the
> follow error occur:
> org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata
> table.org.apache.hudi.exception.HoodieMetadataException: Error syncing to
> metadata table.
> at
> org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:447)
> at
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:433)
> at
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:187)
> we use hudi 0.8, but we also find this problem in latest code of hudi
> test step:
> val df = spark.range(0, 1000).toDF("keyid")
> .withColumn("col3", expr("keyid"))
> .withColumn("age", lit(1))
> .withColumn("p", lit(2))
> df.write.format("hudi").
> option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
> option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
> option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
> option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
> option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
> option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
> option("hoodie.insert.shuffle.parallelism", "4").
> option("hoodie.metadata.enable", "true").
> option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
> .mode(SaveMode.Overwrite).save(basePath)
> // upsert same record again
> df.write.format("hudi").
> option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
> option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
> option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
> option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
> option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
> option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
> option("hoodie.insert.shuffle.parallelism", "4").
> option("hoodie.metadata.enable", "true").
> option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
> .mode(SaveMode.Append).save(basePath)
>
> org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata
> table.org.apache.hudi.exception.HoodieMetadataException: Error syncing to
> metadata table.
> at
> org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:447)
> at
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:433)
> at
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:187)
> at
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:564)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:230)
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162) at
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)